Tell us about your PDF experience.


Azure OpenAI Service Documentation

Learn how to use Azure OpenAI's powerful language models including the GPT-3, Codex and Embeddings model series for content generation, summarization, semantic search, and natural language to code translation.



O V E R V I E W

What is Azure OpenAI Service?


Q U I C K S T A R T

Quickstarts



H O W - T O G U I D E

Create a resource


T U T O R I A L

Embeddings



H O W - T O G U I D E

Completions


T R A I N I N G

Intro to Azure OpenAI training



C O N C E P T

Azure OpenAI Models


R E F E R E N C E

Support and help options


Additional resources


Azure OpenAI

Azure OpenAI Studio Region support Quotas and limits

Apply for access to Azure OpenAI

Video

Combining OpenAI models with the power of Azure


Reference

REST API

Terms of use

Tools

Azure CLI PowerShell

What is Azure OpenAI Service?

Article • 05/01/2023


Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-3, Codex and Embeddings model series. In addition, the new GPT-4 and ChatGPT (gpt-35-turbo) model series are now available in preview. These models can be easily adapted to your specific task including but not limited to content generation, summarization, semantic search, and natural language to code translation. Users can access the service through REST APIs, Python SDK, or our web-based interface in the Azure OpenAI Studio.


Features overview


Feature

Azure OpenAI

Models available

NEW GPT-4 series (preview)


GPT-3 base series


NEW ChatGPT (gpt-35-turbo) (preview)


Codex series


Embeddings series


Learn more in our Models page.

Fine-tuning

Ada


Babbage


Curie


Cushman*


Davinci*


* Currently unavailable. **East US and West Europe Fine-tuning is currently


unavailable to new customers. Please use US South Central for US based


training

Price

Available here

Virtual network support & private link support

Yes

Managed Identity

Yes, via Azure Active Directory

UI experience

Azure Portal for account & resource management,

Azure OpenAI Service Studio for model exploration and fine tuning

Regional availability

East US

South Central US West Europe



Feature Azure OpenAI

Content filtering Prompts and completions are evaluated against our content policy with automated systems. High severity content will be filtered.


Responsible AI

At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure OpenAI have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes requiring applicants to show well-defined use cases, incorporating Microsoft’s principles for responsible AI use , building content filters to support customers, and providing responsible AI implementation guidance to onboarded customers.


How do I get access to Azure OpenAI?

How do I get access to Azure OpenAI?


Access is currently limited as we navigate high demand, upcoming product improvements, and Microsoft’s commitment to responsible AI . For now, we're working with customers with an existing partnership with Microsoft, lower risk use cases, and those committed to incorporating mitigations.

More specific information is included in the application form. We appreciate your patience as we work to responsibly enable broader access to Azure OpenAI.

Apply here for access:


Apply now


Comparing Azure OpenAI and OpenAI

Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-4, GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.

With Azure OpenAI, customers get the security capabilities of Microsoft Azure while running the same models as OpenAI. Azure OpenAI offers private networking, regional availability, and responsible AI content filtering.


Key concepts

Prompts & Completions

The completions endpoint is the core component of the API service. This API provides access to the model's text-in, text-out interface. Users simply need to provide an input prompt containing the English text command, and the model will generate a text completion.

Here's an example of a simple prompt and completion:


""" count to 5 in a for loop """

Prompt:


for i in range(1, 6): print(i)

Completion:


Tokens

Azure OpenAI processes text by breaking it down into tokens. Tokens can be words or just chunks of characters. For example, the word “hamburger” gets broken up into the tokens “ham”, “bur” and “ger”, while a short and common word like “pear” is a single token. Many tokens start with a whitespace, for example “ hello” and “ bye”.

The total number of tokens processed in a given request depends on the length of your input, output and request parameters. The quantity of tokens being processed will also affect your response latency and throughput for the models.


Resources

Azure OpenAI is a new product offering on Azure. You can get started with Azure OpenAI the same way as any other Azure product where you create a resource, or instance of the service, in your Azure Subscription. You can read more about Azure's resource management design.


Deployments

Once you create an Azure OpenAI Resource, you must deploy a model before you can start making API calls and generating text. This action can be done using the

Deployment APIs. These APIs allow you to specify the model you wish to use.


In-context learning

The models used by Azure OpenAI use natural language instructions and examples provided during the generation call to identify the task being asked and skill required. When you use this approach, the first part of the prompt includes natural language instructions and/or examples of the specific task desired. The model then completes the task by predicting the most probable next piece of text. This technique is known as "in-context" learning. These models aren't retrained during this step but instead give predictions based on the context you include in the prompt.

There are three main approaches for in-context learning: Few-shot, one-shot and zero-shot. These approaches vary based on the amount of task-specific data that is given to the model:

Few-shot: In this case, a user includes several examples in the call prompt that demonstrate the expected answer format and content. The following example shows a few-shot prompt where we provide multiple examples (the model will generate the last answer):




Convert the questions to a command:

Q: Ask Constance if we need some bread.

A: send-msg `find constance` Do we need some bread?

Q: Send a message to Greg to figure out if things are ready for Wednesday.

A: send-msg `find greg` Is everything ready for Wednesday? Q: Ask Ilya if we're still having our meeting this evening.

A: send-msg `find ilya` Are we still having a meeting this evening?

Q: Contact the ski store and figure out if I can get my skis fixed before I leave on Thursday.

A: send-msg `find ski store` Would it be possible to get my skis fixed before I leave on Thursday?

Q: Thank Nicolas for lunch.

A: send-msg `find nicolas` Thank you for lunch!

Q: Tell Constance that I won't be home before 19:30 tonight — unmovable meeting.

A: send-msg `find constance` I won't be home before 19:30 tonight. I have a meeting I can't move.

Q: Tell John that I need to book an appointment at 10:30. A:

The number of examples typically range from 0 to 100 depending on how many can fit in the maximum input length for a single prompt. Maximum input length can vary

depending on the specific models you use. Few-shot learning enables a major reduction in the amount of task-specific data required for accurate predictions. This approach will typically perform less accurately than a fine-tuned model.

One-shot: This case is the same as the few-shot approach except only one example is provided.

Zero-shot: In this case, no examples are provided to the model and only the task request is provided.


Models

The service provides users access to several different models. Each model provides a different capability and price point.

GPT-4 models are the latest available models. These models are currently in preview. For access, existing Azure OpenAI customers can apply by filling out this form .

The GPT-3 base models are known as Davinci, Curie, Babbage, and Ada in decreasing order of capability and increasing order of speed.

The Codex series of models is a descendant of GPT-3 and has been trained on both natural language and code to power natural language to code use cases. Learn more about each model on our models concept page.


Next steps

Learn more about the underlying models that power Azure OpenAI.

Azure OpenAI Service quotas and limits

Article • 04/27/2023


This article contains a quick reference and a detailed description of the quotas and limits for Azure OpenAI in Azure Cognitive Services.


Quotas and limits reference

The following sections provide you with a quick guide to the quotas and limits that apply to the Azure OpenAI:


Limit Name

Limit Value

OpenAI resources per region per Azure subscription

3

Requests per minute per model*

Davinci-models (002 and later): 120 ChatGPT model (preview): 300

GPT-4 models (preview): 18 All other models: 300

Tokens per minute per model*

Davinci-models (002 and later):


40,000


ChatGPT model: 120,000


GPT-4 8k model: 10,000


GPT-4 32k model: 32,000


All other models: 120,000

Max fine-tuned model deployments*

2

Ability to deploy same model to multiple deployments

Not allowed

Total number of training jobs per resource

100

Max simultaneous running training jobs per resource

1

Max training jobs queued

20

Max Files per resource

50

Total size of all files per resource

1 GB

Max training job time (job will fail if exceeded)

720 hours

Max training job size (tokens in training file) x (# of epochs)

2 Billion

*The limits are subject to change. We anticipate that you will need higher limits as you move toward production and your solution scales. When you know your solution requirements, please reach out to us by applying for a quota increase here: https://aka.ms/oai/quotaincrease

For information on max tokens for different models, consult the models article


General best practices to mitigate throttling during autoscaling

To minimize issues related to throttling, it's a good idea to use the following techniques: Implement retry logic in your application.

Avoid sharp changes in the workload. Increase the workload gradually. Test different load increase patterns.

Create another OpenAI service resource in the same or different regions, and distribute the workload among them.

The next sections describe specific cases of adjusting quotas.


How to request increases to the default quotas and limits

At this time, due to overwhelming demand we cannot accept any new resource or quota increase requests.


Note

Ensure that you thoroughly assess your current resource utilization, approaching its full capacity. Be aware that we will not grant additional resources if efficient usage of existing resources is not observed.


Next steps

Learn more about the underlying models that power Azure OpenAI.

Azure OpenAI Service models

Article • 05/11/2023


Azure OpenAI provides access to many different models, grouped by family and capability. A model family typically associates models by their intended task. The following table describes model families currently available in Azure OpenAI. Not all models are available in all regions currently. Refer to the model capability table in this article for a full breakdown.


Model family

Description

GPT-4

A set of models that improve on GPT-3.5 and can understand as well as generate natural language and code. These models are currently in preview.

GPT-3

A series of models that can understand and generate natural language. This includes the new ChatGPT model (preview).

Codex

A series of models that can understand and generate code, including translating natural language to code.

Embeddings A set of models that can understand and use embeddings. An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Currently, we offer three families of Embeddings models for different functionalities: similarity, text search, and code search.


Model capabilities

Each model family has a series of models that are further distinguished by capability. These capabilities are typically identified by names, and the alphabetical order of these names generally signifies the relative capability and cost of that model within a given model family. For example, GPT-3 models use names such as Ada, Babbage, Curie, and Davinci to indicate relative capability and cost. Davinci is more capable and more expensive than Curie, which in turn is more capable and more expensive than Babbage, and so on.


Note

Any task that can be performed by a less capable model like Ada can be performed by a more capable model like Curie or Davinci.

Naming convention

Azure OpenAI model names typically correspond to the following standard naming convention:


{capability}-{family}[-{input-type}]-{identifier}


Element

Description


{capability}

The model capability of the model. For example, GPT-3 models uses text , while Codex models use code .


{family}

The relative family of the model. For example, GPT-3 models include ada , babbage , curie , and davinci .


{input-type}

(Embeddings models only) The input type of the embedding supported by the model. For example, text search embedding models support doc and query .


{identifier}

The version identifier of the model.


For example, our most powerful GPT-3 model is called text-davinci-003 , while our most powerful Codex model is called code-davinci-002 .


davinci

The older versions of GPT-3 models named ada , babbage , curie , and that

don't follow the standard naming convention are primarily intended for fine tuning. For more information, see Learn how to customize a model for your application.


Finding what models are available

You can get a list of models that are available for both inference and fine-tuning by your Azure OpenAI resource by using the Models List API.


Finding the right model

We recommend starting with the most capable model in a model family to confirm whether the model capabilities meet your requirements. Then you can stay with that model or move to a model with lower capability and cost, optimizing around that model's capabilities.


GPT-4 models (preview)

GPT-4 can solve difficult problems with greater accuracy than any of OpenAI's previous models. Like gpt-35-turbo, GPT-4 is optimized for chat but works well for traditional completions tasks.

These models are currently in preview. For access, existing Azure OpenAI customers can apply by filling out this form .


gpt-4


gpt-4-32k


gpt-4

The tokens.

supports 8192 max input tokens and the

supports up to 32,768


gpt-4-32k

GPT-3 models

The GPT-3 models can understand and generate natural language. The service offers four model capabilities, each with different levels of power and speed suitable for different tasks. Davinci is the most capable model, while Ada is the fastest. In the order of greater to lesser capability, the models are:


text-davinci-003


text-curie-001


text-babbage-001


text-ada-001


While Davinci is the most capable, the other models provide significant speed advantages. Our recommendation is for users to start with Davinci while experimenting, because it produces the best results and validate the value that Azure OpenAI can provide. Once you have a prototype working, you can then optimize your model choice with the best latency/performance balance for your application.


Davinci

Davinci is the most capable model and can perform any task the other models can perform, often with less instruction. For applications requiring deep understanding of the content, like summarization for a specific audience and creative content generation, Davinci produces the best results. The increased capabilities provided by Davinci require more compute resources, so Davinci costs more and isn't as fast as other models.

Another area where Davinci excels is in understanding the intent of text. Davinci is excellent at solving many kinds of logic problems and explaining the motives of

characters. Davinci has been able to solve some of the most challenging AI problems involving cause and effect.

Use for: Complex intent, cause and effect, summarization for audience


Curie

Curie is powerful, yet fast. While Davinci is stronger when it comes to analyzing complicated text, Curie is capable for many nuanced tasks like sentiment classification and summarization. Curie is also good at answering questions and performing Q&A and as a general service chatbot.

Use for: Language translation, complex classification, text sentiment, summarization


Babbage

Babbage can perform straightforward tasks like simple classification. It’s also capable when it comes to semantic search, ranking how well documents match up with search queries.

Use for: Moderate classification, semantic search classification


Ada

Ada is usually the fastest model and can perform tasks like parsing text, address correction and certain kinds of classification tasks that don’t require too much nuance. Ada’s performance can often be improved by providing more context.

Use for: Parsing text, simple classification, address correction, keywords


ChatGPT (gpt-35-turbo) (preview)

The ChatGPT model (gpt-35-turbo) is a language model designed for conversational interfaces and the model behaves differently than previous GPT-3 models. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the ChatGPT model is conversation-in and message-out. The model expects a prompt string formatted in a specific chat-like transcript format, and returns a completion that represents a model-written message in the chat.

To learn more about the ChatGPT model and how to interact with the Chat API check out our in-depth how-to.

Codex models

The Codex models are descendants of our base GPT-3 models that can understand and generate code. Their training data contains both natural language and billions of lines of public code from GitHub.

They’re most capable in Python and proficient in over a dozen languages, including C#, JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and Shell. In the order of greater to lesser capability, the Codex models are:


code-davinci-002


code-cushman-001


Davinci

Similar to GPT-3, Davinci is the most capable Codex model and can perform any task the other models can perform, often with less instruction. For applications requiring deep understanding of the content, Davinci produces the best results. Greater capabilities require more compute resources, so Davinci costs more and isn't as fast as other models.


Cushman

Cushman is powerful, yet fast. While Davinci is stronger when it comes to analyzing complicated tasks, Cushman is a capable model for many code generation tasks.

Cushman typically runs faster and cheaper than Davinci, as well.


Embeddings models


Important

We strongly recommend using text-embedding-ada-002 (Version 2) . This model/version provides parity with OpenAI's text-embedding-ada-002 . To learn more about the improvements offered by this model, please refer to OpenAI's blog post . Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.


Currently, we offer three families of Embeddings models for different functionalities:

Similarity Text search Code search

Each family includes models across a range of capability. The following list indicates the length of the numerical vector returned by the service, based on model capability:

Ada: 1024 dimensions

Babbage: 2048 dimensions

Curie: 4096 dimensions

Davinci: 12288 dimensions


Davinci is the most capable, but is slower and more expensive than the other models. Ada is the least capable, but is both faster and cheaper.


Similarity embedding

These models are good at capturing semantic similarity between two or more pieces of text.


Use cases Models

Clustering, regression, anomaly detection, visualization text-similarity-ada-001

text-similarity-babbage-001 text-similarity-curie-001 text-similarity-davinci-001


Text search embedding

These models help measure whether long documents are relevant to a short search query. There are two input types supported by this family: doc , for embedding the documents to be retrieved, and query , for embedding the search query.


Use cases Models

Search, context relevance, information retrieval text-search-ada-doc-001

text-search-ada-query-001 text-search-babbage-doc-001 text-search-babbage-query-001 text-search-curie-doc-001 text-search-curie-query-001 text-search-davinci-doc-001 text-search-davinci-query-001

Code search embedding

Similar to text search embedding models, there are two input types supported by this family: code , for embedding code snippets to be retrieved, and text , for embedding natural language search queries.


Use cases Models

Code search and relevance code-search-ada-code-001 code-search-ada-text-001

code-search-babbage-code-001 code-search-babbage-text-001

When using our embeddings models, keep in mind their limitations and risks.


Model Summary table and region availability


Important

South Central US is temporarily unavailable for creating new resources due to high demand.


GPT-3 Models

These models can be used with Completion API requests.


gpt-35-turbo

is the only

model that can be used with both Completion API requests and the Chat Completion API.


Model ID Base model Regions Fine- Max Training

Tuning Request Data (up Regions (tokens) to)

ada N/A N/A 2,049 Oct 2019

text-ada-001 East US, South Central US, N/A 2,049 Oct 2019 West Europe

babbage N/A N/A 2,049 Oct 2019

text-babbage-001 East US, South Central US, N/A 2,049 Oct 2019 West Europe

curie N/A N/A 2,049 Oct 2019


Model ID Base model Regions Fine- Max Training

Tuning Request Data (up Regions (tokens) to)

text-curie-001 East US, South Central US, N/A 2,049 Oct 2019 West Europe

davinci N/A N/A 2,049 Oct 2019

text-davinci-001 South Central US, West Europe N/A

text-davinci-002 East US, South Central US, N/A 4,097 Jun 2021 West Europe

text-davinci-003 East US, West Europe N/A 4,097 Jun 2021

text-davinci-fine- N/A N/A

tune-002

gpt-35-turbo1 East US, France Central, South N/A 4,096 Sep 2021 (ChatGPT) Central US, West Europe

(preview)


0301

1 Currently, only version of this model is available. This version of the model will be

deprecated on 8/1/2023 in favor of newer version of the gpt-35-model. See ChatGPT model versioning for more details.


GPT-4 Models

These models can only be used with the Chat Completion API.


Model ID

Base model Regions

Fine-Tuning Regions

Max Request (tokens)

Training Data (up to)

gpt-4 1,2

(preview)

East US, France Central

N/A

8,192

September 2021

gpt-4-32k 1,2

(preview)

East US, France Central

N/A

32,768

September 2021


1 The model is in preview and only available by request .

0314

2 Currently, only version of this model is available.


Codex Models

These models can only be used with Completions API requests.

Model ID

Base model Regions

Fine-Tuning Regions

Max Request (tokens)

Training Data (up to)

code-cushman-0011

South Central US, West Europe

Currently unavailable

2,048


code-davinci-002

East US, West Europe

N/A

8,001

Jun 2021


1 The model is available for fine-tuning by request only. Currently we aren't accepting new requests to fine-tune the model.


Embeddings Models

These models can only be used with Embedding API requests.


Note

We strongly recommend using text-embedding-ada-002 (Version 2) . This model/version provides parity with OpenAI's text-embedding-ada-002 . To learn more about the improvements offered by this model, please refer to OpenAI's blog post . Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.


Model ID

Base model Regions

Fine-Tuning Regions

Max Request (tokens)

Training

Data (up to)

text-embedding-ada-002 (version 2)

East US, South Central US

N/A

8,191

Sep 2021

text-embedding-ada-002 (version 1)

East US, South Central US, West Europe

N/A

4,095

Sep 2021

text-similarity-ada-001

East US, South Central US, West Europe

N/A

2,046

Aug 2020

text-similarity-babbage-001

South Central US, West Europe

N/A

2,046

Aug 2020

text-similarity-curie-001

East US, South Central US, West Europe

N/A

2046

Aug 2020


Model ID

Base model Regions

Fine-Tuning Regions

Max Request (tokens)

Training

Data (up to)

text-similarity-davinci-001

South Central US, West Europe

N/A

2,046

Aug 2020

text-search-ada-doc-001

South Central US, West Europe

N/A

2,046

Aug 2020

text-search-ada-query-001

South Central US, West Europe

N/A

2,046

Aug 2020

text-search-babbage-doc-001

South Central US, West Europe

N/A

2,046

Aug 2020

text-search-babbage-query-001

South Central US, West Europe

N/A

2,046

Aug 2020

text-search-curie-doc-001

South Central US, West Europe

N/A

2,046

Aug 2020

text-search-curie-query-001

South Central US, West Europe

N/A

2,046

Aug 2020

text-search-davinci-doc-001

South Central US, West Europe

N/A

2,046

Aug 2020

text-search-davinci-query-001

South Central US, West Europe

N/A

2,046

Aug 2020

code-search-ada-code-001

South Central US, West Europe

N/A

2,046

Aug 2020

code-search-ada-text-001

South Central US, West Europe

N/A

2,046

Aug 2020

code-search-babbage-code-001

South Central US, West Europe

N/A

2,046

Aug 2020

code-search-babbage-text-001

South Central US, West Europe

N/A

2,046

Aug 2020


Next steps

Learn more about Azure OpenAI

Learn more about fine-tuning Azure OpenAI models

What's new in Azure OpenAI Service

Article • 05/11/2023


May 2023

Azure OpenAI is now available in the France Central region. Check the models page, for the latest information on model availability in each region.


April 2023

Inactive deployments of customized models will now be deleted after 15 days; models will remain available for redeployment. If a customized (fine-tuned) model is deployed for more than fifteen (15) days during which no completions or

chat completions calls are made to it, the deployment will automatically be deleted (and no further hosting charges will be incurred for that deployment). The underlying customized model will remain available and can be redeployed at any time. To learn more check out the how-to-article.


March 2023

GPT-4 series models are now available in preview on Azure OpenAI. To request access, existing Azure OpenAI customers can apply by filling out this form . These models are currently available in the East US and South Central US regions.

New Chat Completion API for ChatGPT and GPT-4 models released in preview on 3/21. To learn more checkout the updated quickstarts and how-to article.

ChatGPT (gpt-35-turbo) preview. To learn more checkout the how-to article.


Increased training limits for fine-tuning: The max training job size (tokens in training file) x (# of epochs) is 2 Billion tokens for all models. We have also increased the max training job from 120 to 720 hours.

Adding additional use cases to your existing access. Previously, the process for adding new use cases required customers to reapply to the service. Now, we're releasing a new process that allows you to quickly add new use cases to your use of the service. This process follows the established Limited Access process within Azure Cognitive Services. Existing customers can attest to any and all new use cases here . Please note that this is required anytime you would like to use the service for a new use case you did not originally apply for.

February 2023

New Features

.NET SDK(inference) preview release | Samples

suffix

Terraform SDK update to support Azure OpenAI management operations. Inserting text at the end of a completion is now supported with the parameter.


Updates

Content filtering is on by default.


New articles on:


Monitoring an Azure OpenAI Service Plan and manage costs for Azure OpenAI

New training course:


Intro to Azure OpenAI


January 2023

New Features

Service GA. Azure OpenAI Service is now generally available.


New models: Addition of the latest text model, text-davinci-003 (East US, West Europe), text-ada-embeddings-002 (East US, South Central US, West Europe)


December 2022

New features

The latest models from OpenAI. Azure OpenAI provides access to all the latest models including the GPT-3.5 series.

New API version (2022-12-01). This update includes several requested enhancements including token usage information in the API response, improved

error messages for files, alignment with OpenAI on fine-tuning creation data structure, and support for the suffix parameter to allow custom naming of fine-tuned jobs.

Higher request per second limits. 50 for non-Davinci models. 20 for Davinci models.

Faster fine-tune deployments. Deploy an Ada and Curie fine-tuned models in under 10 minutes.

Higher training limits: 40M training tokens for Ada, Babbage, and Curie. 10M for Davinci.

Process for requesting modifications to the abuse & miss-use data logging & human review. Today, the service logs request/response data for the purposes of abuse and misuse detection to ensure that these powerful models aren't abused. However, many customers have strict data privacy and security requirements that require greater control over their data. To support these use cases, we're releasing a new process for customers to modify the content filtering policies or turn off the abuse logging for low-risk use cases. This process follows the established Limited Access process within Azure Cognitive Services and existing OpenAI customers can apply here .

Customer managed key (CMK) encryption. CMK provides customers greater control over managing their data in Azure OpenAI by providing their own encryption keys used for storing training data and customized models. Customer-managed keys (CMK), also known as bring your own key (BYOK), offer greater flexibility to create, rotate, disable, and revoke access controls. You can also audit the encryption keys used to protect your data. Learn more from our encryption at rest documentation.

Lockbox support SOC-2 compliance

Logging and diagnostics through Azure Resource Health, Cost Analysis, and Metrics & Diagnostic settings.

Studio improvements. Numerous usability improvements to the Studio workflow including Azure AD role support to control who in the team has access to create fine-tuned models and deploy.


Changes (breaking)

Fine-tuning create API request has been updated to match OpenAI’s schema.


Preview API versions:


JSON


{

"training_file": "file-XGinujblHPwGLSztz8cPS8XY", "hyperparams": {

"batch_size": 4,

"learning_rate_multiplier": 0.1,

"n_epochs": 4,

"prompt_loss_weight": 0.1,

}

}

API version 2022-12-01:


JSON


{

"training_file": "file-XGinujblHPwGLSztz8cPS8XY", "batch_size": 4,

"learning_rate_multiplier": 0.1,

"n_epochs": 4,

"prompt_loss_weight": 0.1,

}

Content filtering is temporarily off by default. Azure content moderation works differently than OpenAI. Azure OpenAI runs content filters during the generation call to detect harmful or abusive content and filters them from the response. Learn More

These models will be re-enabled in Q1 2023 and be on by default.


Customer actions


Contact Azure Support if you would like these turned on for your subscription. Apply for filtering modifications , if you would like to have them remain off. (This option will be for low-risk use cases only.)


Next steps

Learn more about the underlying models that power Azure OpenAI.

Azure OpenAI Service frequently asked questions

FAQ


If you can't find answers to your questions in this document, and still need help check the Cognitive Services support options guide. Azure OpenAI is part of Azure Cognitive Services.


Data and Privacy

Do you use my company data to train any of the models?

Azure OpenAI doesn't use customer data to retrain models. For more information, see the Azure OpenAI data, privacy, and security guide.


General

Does Azure OpenAI support GPT-4?

Azure OpenAI supports the latest GPT-4 models. These models are currently only available by request. For access, existing Azure OpenAI customers can apply by filling out this form .


How do the capabilities of Azure OpenAI compare to OpenAI?

Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.

With Azure OpenAI, customers get the security capabilities of Microsoft Azure while running the same models as OpenAI.

Does Azure OpenAI support VNETs and Private Endpoints?

Yes, as part of Azure Cognitive Services, Azure OpenAI supports VNETs and Private Endpoints. To learn more consult the Cognitive Services virtual networking guidance


Do the GPT-4 models currently support image input?

No, GPT-4 is designed by OpenAI to be multimodal, but currently only text input and output are supported.


How do I apply for new use cases?

Previously, the process for adding new use cases required customers to reapply to the service. Now, we're releasing a new process that allows you to quickly add new use cases to your use of the service. This process follows the established Limited Access process within Azure Cognitive Services. Existing customers can attest to any and all new use cases here . Please note that this is required anytime you would like to use the service for a new use case you did not originally apply for.


I am trying to use embeddings and received the error "InvalidRequestError: Too many inputs. The max number of inputs is 1." How do I fix this?

This error typically occurs when you try to send a batch of text to embed in a single API request as an array. Currently Azure OpenAI does not support batching with embedding requests. Embeddings API calls should consist of a single string input per request. The string can be up to 8191 tokens in length when using the text-embedding-ada-002 (Version 2) model.


Where can I read about better ways to use Azure OpenAI to get the responses I want from the service?

Check out our introduction to prompt engineering While these models are extremely powerful, their behavior is also very sensitive to the prompts they receive from the user. This makes prompt construction an important skill to develop. After you've mastered

the introduction, check out our article on more advanced prompt engineering techniques.


Getting access to Azure OpenAI Service

How do I get access to Azure OpenAI?

Access is currently limited as we navigate high demand, upcoming product improvements, and Microsoft's commitment to responsible AI. For now, we're working with customers with an existing partnership with Microsoft, lower risk use cases, and those committed to incorporating mitigations. In addition to applying for initial access, all solutions using Azure OpenAI are required to go through a use case review before they can be released for production use. Apply here for initial access or for a production review: Apply now


After I apply for access, how long will I have to wait to get approved?

We don't currently provide a timeline for access approval.


Learning more and where to ask questions

Where can I read about the latest updates to Azure OpenAI?

For monthly updates, see our what's new page.


Where can I get training to get started learning and build my skills around Azure OpenAI?

Check out our introduction to Azure OpenAI training course.


Where can I post questions and see answers to other common questions?

We recommend posting questions on Microsoft Q&A

Alternatively, you can post questions on Stack Overflow


Where do I go for Azure OpenAI customer support?

Azure OpenAI is part of Azure Cognitive Services. You can learn about all the support options for Azure Cognitive Services in the support and help options guide.


Models and fine-tuning

What models are available?

Consult the Azure OpenAI model availability guide.


Where can I find out what region a model is available in?

Consult the Azure OpenAI model availability guide for region availability.


What is the difference between a base model and a fine-tuned model?

A base model is a model that hasn't been customized or fine-tuned for a specific use case. Fine-tuned models are customized versions of base models where a model's weights are trained on a unique set of prompts. Fine-tuned models let you achieve better results on a wider number of tasks without needing to provide detailed examples for in-context learning as part of your completion prompt. To learn more, review our fine-tuning guide.


What is the maximum number of fine-tuned models I can create?

100


What are the SLAs for API responses in Azure OpenAI?

We don't have a defined API response time Service Level Agreement (SLA) at this time. The overall SLA for Azure OpenAI Service is the same as for other Azure Cognitive Services. For more information, see the Cognitive Services section of the Service Level Agreements (SLA) for Online Services page .


Why was my fine-tuned model deployment deleted?

If a customized (fine-tuned) model is deployed for more than fifteen (15) days during which no completions or chat completions calls are made to it, the deployment will automatically be deleted (and no further hosting charges will be incurred for that deployment). The underlying customized model will remain available and can be redeployed at any time. To learn more check out the how-to-article.


How do I deploy a model with the REST API?

There are currently two different REST APIs that allow model deployment. For the latest model deployment features such as the ability to specify a model version during deployment for models like text-embedding-ada-002 Version 2, use the Cognitive Services Create or Update REST API call.


Next steps

Azure OpenAI quotas and limits Azure OpenAI what's new Azure OpenAI quickstarts

Quickstart: Get started generating text using Azure OpenAI Service

Article • 05/24/2023


Use this article to get started making your first calls to Azure OpenAI.


Prerequisites

An Azure subscription - Create one for free .


Access granted to Azure OpenAI in the desired Azure subscription.


Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.

An Azure OpenAI resource with a model deployed. For more information about model deployment, see the resource deployment guide.


I ran into an issue with the prerequisites.


Go to the Azure OpenAI Studio

Navigate to Azure OpenAI Studio at https://oai.azure.com/ and sign-in with credentials that have access to your OpenAI resource. During or after the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.

From the Azure OpenAI Studio landing page navigate further to explore examples for prompt completion, manage your deployments and models, and find learning resources such as documentation and community forums.


Go to the Playground for experimentation and fine-tuning workflow.


Playground

Start exploring Azure OpenAI capabilities with a no-code approach through the GPT-3 Playground. It's simply a text box where you can submit a prompt to generate a completion. From this page, you can quickly iterate and experiment with the capabilities.


You can select a deployment and choose from a few pre-loaded examples to get started. If your resource doesn't have a deployment, select Create a deployment and follow the instructions provided by the wizard. For more information about model deployment, see the resource deployment guide.

You can experiment with the configuration settings such as temperature and pre-response text to improve the performance of your task. You can read more about each parameter in the REST API.

Selecting the Generate button will send the entered text to the completions API and stream the results back to the text box.

Select the Undo button to undo the prior generation call.

Select the Regenerate button to complete an undo and generation call together.


Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses may be filtered if harmful content is detected. For more information, see the content filter article.

In the GPT-3 playground you can also view Python and curl code samples pre-filled according to your selected settings. Just select View code next to the examples dropdown. You can write an application to complete the same task with the OpenAI Python SDK, curl, or other REST API client.


Try text summarization

To use the Azure OpenAI for text summarization in the GPT-3 Playground, follow these steps:

  1. Sign in to Azure OpenAI Studio .


  2. Select the subscription and OpenAI resource to work with.


  3. Select GPT-3 Playground at the top of the landing page.


  4. Select your deployment from the Deployments dropdown. If your resource doesn't have a deployment, select Create a deployment and then revisit this step.

  5. Select Summarize Text from the Examples dropdown.


  6. Select Generate . Azure OpenAI will attempt to capture the context of text and rephrase it succinctly. You should get a result that resembles the following text:




Tl;dr A neutron star is the collapsed core of a supergiant star. These incredibly dense objects are incredibly fascinating due to their

strange properties and their potential for phenomena such as extreme gravitational forces and a strong magnetic field.

The accuracy of the response can vary per model. The Davinci based model in this example is well-suited to this type of summarization, whereas a Codex based model wouldn't perform as well at this particular task.


I ran into an issue with the playground.


Clean up resources

If you want to clean up and remove an OpenAI resource, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it.

Portal Azure CLI


Next steps

Learn more about how to generate the best completion in our How-to guide on completions.

For more examples check out the Azure OpenAI Samples GitHub repository .

Quickstart: Get started using ChatGPT and GPT-4 with Azure OpenAI Service

Article • 05/24/2023


Use this article to get started using Azure OpenAI.


Prerequisites

An Azure subscription - Create one for free .


Access granted to Azure OpenAI in the desired Azure subscription.


Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.


gpt-35-turbo

An Azure OpenAI Service resource with either the or the gpt-4 1

models deployed. For more information about model deployment, see the resource deployment guide.

1 GPT-4 models are currently only available by request. To access these models, existing Azure OpenAI customers can apply for access by filling out this form .


I ran into an issue with the prerequisites.


Go to Azure OpenAI Studio

Navigate to Azure OpenAI Studio at https://oai.azure.com/ and sign-in with credentials that have access to your OpenAI resource. During or after the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.

From the Azure OpenAI Studio landing page, select Chat playground.


Playground

Start exploring OpenAI capabilities with a no-code approach through the Azure OpenAI Studio Chat playground. From this page, you can quickly iterate and experiment with the capabilities.



Assistant setup

You can use the Assistant setup dropdown to select a few pre-loaded System message

examples to get started.


System messages give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, tell it what it should and shouldn't answer, and tell it how to format responses.

Add few-shot examples allows you to provide conversational examples that are used by the model for in-context learning.

At any time while using the Chat playground you can select View code to see Python, curl, and json code samples pre-populated based on your current chat session and settings selections. You can then take this code and write an application to complete the same task you're currently performing with the playground.


Chat session

Selecting the Send button sends the entered text to the completions API and the results are returned back to the text box.

Select the Clear chat button to delete the current conversation history.


Settings


Name

Description

Deployments

Your deployment name that is associated with a specific model. For ChatGPT, you need to use the gpt-35-turbo model.

Temperature

Controls randomness. Lowering the temperature means that the model produces more repetitive and deterministic responses. Increasing the temperature results in more unexpected or creative responses. Try adjusting temperature or Top P but not both.

Max length (tokens)

Set a limit on the number of tokens per model response. The API supports a maximum of 4096 tokens shared between the prompt (including system message, examples, message history, and user query) and the model's response. One token is roughly four characters for typical English text.

Top probabilities

Similar to temperature, this controls randomness but uses a different method. Lowering Top P narrows the model’s token selection to likelier tokens. Increasing Top P lets the model choose from tokens with both high and low likelihood. Try adjusting temperature or Top P but not both.


Name Description

Multi-turn Select the number of past messages to include in each new API request. This conversations helps give the model context for new user queries. Setting this number to 10

results in five user queries and five system responses.

Stop Stop sequence make the model end its response at a desired point. The model sequences response ends before the specified sequence, so it won't contain the stop

sequence text. For ChatGPT, using <|im_end|> ensures that the model response doesn't generate a follow-up user query. You can include as many as four stop sequences.


Show panels

By default there are three panels: assistant setup, chat session, and settings. Show

panels allows you to add, remove, and rearrange the panels. If you ever close a panel and need to get it back, use Show panels to restore the lost panel.


Start a chat session

  1. From the assistant setup drop-down, select Xbox customer support agent


  2. You'll be prompted asking if you want to update the system message, select

    Continue.


  3. In the chat session pane, enter the following question: "I'm interested in buying a new Xbox", and select Send.

  4. You'll receive a response similar to:



  5. Enter a follow-up question like: "which models support 4K?"


  6. Now that you have a basic conversation select View code from under Assistant setup and you'll have a replay of the code behind the entire conversation so far:


Python


#Note: The openai-python library support for Azure OpenAI is in preview. import os

import openai

openai.api_type = "azure"

openai.api_base = "https://docs-test-001.openai.azure.com/" openai.api_version = "2023-05-15"

openai.api_key = os.getenv("OPENAI_API_KEY")


response = openai.ChatCompletion.create(

engine="gpt-35-turbo", # replace this value with the deployment name you chose when you deployed the associated model.

messages = [{"role":"system","content":"You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide

factual answers to queries, and do not provide answers that are not related to Xbox."},{"role":"user","content":"How much is a PS5?"},

{"role":"assistant","content":"I apologize, but I do not have information about the prices of other gaming devices such as the PS5. My primary focus is to assist with issues regarding Xbox devices. Is there a specific issue you are having with your Xbox device that I may be able to help with?"},

{"role":"user","content":"I am interested in buying an Xbox?"},

{"role":"assistant","content":"Great! There are several Xbox models

available in the market, including the Xbox Series X, Xbox Series S, and

Xbox One. The Xbox Series X is the latest and most powerful console, while the Xbox Series S is a more affordable option that still offers great

performance. The Xbox One is the previous generation console, but it still

has a large library of games available to play.\n\nBefore purchasing an Xbox, you may want to consider factors such as your budget, the types of

games you want to play, and whether you want to play games in 4K resolution.

Once you have decided on a console, you can purchase it from a retailer or directly from the Xbox website. Let me know if you have any other questions or concerns!"},{"role":"user","content":"Which models are 4K?"},

{"role":"assistant","content":"Both the Xbox Series X and Xbox One X support 4K resolution gaming. The Xbox Series S, on the other hand, supports up to 1440p resolution gaming, but it can upscale games to 4K. It's worth noting

that in order to take advantage of 4K gaming, you'll need a 4K TV or monitor that supports HDR (High Dynamic Range) and HDMI 2.0 or higher."}],

temperature=0,


max_tokens=350, top_p=0.95,

frequency_penalty=0, presence_penalty=0, stop=None)


I ran into an issue with the playground.


Understanding the prompt structure

If you examine the sample from View code you'll notice some unique tokens that weren't part of a typical GPT completion call. ChatGPT was trained to use special tokens to delineate different parts of the prompt. Content is provided to the model in between

<|im_start|>

<|im_end|>

and tokens. The prompt begins with a system message that

can be used to prime the model by including context or instructions for the model. After that, the prompt contains a series of messages between the user and the assistant.

The assistant's response to the prompt will then be returned below the

<|im_start|>assistant

<|im_end|>

token and will end with denoting that the assistant

has finished its response. You can also use the Show raw syntax toggle button to display these tokens within the chat session panel.

The ChatGPT how-to guide provides an in-depth introduction into the new prompt

gpt-35-turbo

structure and how to use the model effectively.


Clean up resources


Once you're done testing out the Chat playground, if you want to clean up and remove an OpenAI resource, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it.

Portal Azure CLI


Next steps


gpt-35-turbo

Learn more about how to work with ChatGPT and the new with the ChatGPT how-to guide.

model

For more examples check out the Azure OpenAI Samples GitHub repository

Content filtering

Article • 04/14/2023


Azure OpenAI Service includes a content management system that works alongside core models to filter content. This system works by running both the input prompt and generated content through an ensemble of classification models aimed at detecting misuse. If the system identifies harmful content, you'll receive either an error on the API call if the prompt was deemed inappropriate or the finish_reason on the response will

content_filter

be to signify that some of the generation was filtered. You can generate

content with the completions API using many different configurations that will alter the filtering behavior you should expect. The following section aims to enumerate all of these scenarios for you to appropriately design your solution.

To ensure you have properly mitigated risks in your application, you should evaluate all potential harms carefully, follow guidance in the Transparency Note and add scenario-specific mitigation as needed.


Scenario details

When building your application, you'll want to account for scenarios where the content returned by the Completions API is filtered and content may not be complete. How you act on this information will be application specific. The behavior can be summarized in the following key points:

Prompts that are deemed inappropriate will return an HTTP 400 error

Non-streaming completions calls won't return any content when the content is

finish_reason

filtered. The value will be set to content_filter . In rare cases with

finish_reason

long responses, a partial result can be returned. In these cases, the will be updated.

For streaming completions calls, segments will be returned back to the user as

they're completed. The service will continue streaming until either reaching a stop token, length or harmful content is detected.


Scenario: You send a non-streaming completions call asking for multiple generations with no inappropriate content

The table below outlines the various ways content filtering can appear:



HTTP Response behavior response

code

200 In the cases when all generation passes the filter models no content moderation details are added to the response. The finish_reason for each generation will be either stop or length.


Example request payload:


JSON


{



"prompt":"Text example",


"n": 3,


"stream": false

}


Example response JSON:


JSON


{

"id": "example-id",

"object": "text_completion", "created": 1653666286,

"model": "davinci", "choices": [

{

"text": "Response generated text", "index": 0,

"finish_reason": "stop", "logprobs": null

}

]

}


Scenario: Your API call asks for multiple responses (N>1) and at least 1 of the responses is filtered


HTTP Response Response behavior Code

200 The generations that were filtered will have a finish_reason value of 'content_filter'.

Example request payload:


JSON


{



"prompt":"Text example",


"n": 3,


"stream": false

}


Example response JSON:


JSON


{

"id": "example",

"object": "text_completion", "created": 1653666831,

"model": "ada", "choices": [

{

"text": "returned text 1", "index": 0,

"finish_reason": "length", "logprobs": null

},

{

"text": "returned text 2", "index": 1,

"finish_reason": "content_filter", "logprobs": null

}

]

}


Scenario: An inappropriate input prompt is sent to the completions API (either for streaming or non-streaming)


HTTP Response behavior Response

Code

400 The API call will fail when the prompt triggers one of our content policy models. Modify the prompt and try again.

Example request payload:


JSON



{

"prompt":"Content that triggered the filtering model"

}


Example response JSON:


JSON


"error": {

"message": "The response was filtered", "type": null,

"param": "prompt",

"code": "content_filter", "status": 400

}


Scenario: You make a streaming completions call with all generated content passing the content filters


HTTP Response behavior Response

Code

200 In this case, the call will stream back with the full generation and finish_reason will be either 'length' or 'stop' for each generated response.

Example request payload:


JSON


{



"prompt":"Text example",


"n": 3,


"stream": true

}


Example response JSON:


JSON


{



"id": "cmpl-example",


"object": "text_completion",


"created": 1653670914,


"model": "ada",


"choices": [


{

"text": "last part of generation", "index": 2,

"finish_reason": "stop", "logprobs": null

}

]

}


Scenario: You make a streaming completions call asking for multiple generated responses and at least one response is filtered


HTTP Response behavior Response

Code

200 For a given generation index, the last chunk of the generation will include a non-null finish_reason value. The value will be 'content_filter' when the generation was filtered.

Example request payload:


JSON


{



"prompt":"Text example",


"n": 3,


"stream": true

}


Example response JSON:


JSON


{

"id": "cmpl-example",

"object": "text_completion", "created": 1653670515,

"model": "ada", "choices": [

{

"text": "Last part of generated text streamed back", "index": 2,

"finish_reason": "content_filter", "logprobs": null

}


]

}


Scenario: Content filtering system doesn't run on the generation


HTTP Response behavior Response

Code

200 If the content filtering system is down or otherwise unable to complete the operation in time, your request will still complete. You can determine that the filtering wasn't applied by looking for an error message in the "content_filter_result" object.

Example request payload:


JSON


{



"prompt":"Text example",


"n": 1,


"stream": false

}


Example response JSON:


JSON


{

"id": "cmpl-example",

"object": "text_completion", "created": 1652294703,

"model": "ada", "choices": [

{

"text": "generated text", "index": 0,

"finish_reason": "length", "logprobs": null,

"content_filter_result": { "error": {

"code": "content_filter_error",

"message": "The contents are not filtered"

}

}

}

]

}


Best practices

As part of your application design you'll need to think carefully on how to maximize the benefits of your applications while minimizing the harms. Consider the following best practices:

How you want to handle scenarios where your users send in-appropriate or miss-use your application. Check the finish_reason to see if the generation is filtered.

error

If it's critical that the content filters run on your generations, check that there's no object in the content_filter_result .

user

user

To help with monitoring for possible misuse, applications serving multiple end-

users should pass the

parameter with each API call. The

should be a

unique identifier for the end-user. Don't send any actual user identifiable information as the value.


Next steps

Learn more about the underlying models that power Azure OpenAI.

Understanding embeddings in Azure OpenAI Service

Article • 05/10/2023


An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.


Embedding models

Different Azure OpenAI embedding models are specifically created to be good at a particular task. Similarity embeddings are good at capturing semantic similarity between two or more pieces of text. Text search embeddings help measure whether long documents are relevant to a short query. Code search embeddings are useful for embedding code snippets and embedding natural language search queries.

Embeddings make it easier to do machine learning on large inputs representing words by capturing the semantic similarities in a vector space. Therefore, we can use embeddings to determine if two text chunks are semantically related or similar, and provide a score to assess similarity.


Cosine similarity

Azure OpenAI embeddings rely on cosine similarity to compute similarity between documents and a query.

From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multi-dimensional space. This is beneficial because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity. For more information about cosine similarity equations, see this article on Wikipedia .

An alternative method of identifying similar documents is to count the number of common words between documents. Unfortunately, this approach doesn't scale since an expansion in document size is likely to lead to a greater number of common words

detected even among completely disparate topics. For this reason, cosine similarity can offer a more effective alternative.


Next steps

Learn more about using Azure OpenAI and embeddings to perform document search with our embeddings tutorial.

Introduction to red teaming large language models (LLMs)

Article • 05/19/2023


The term red teaming has historically described systematic adversarial attacks for testing security vulnerabilities. With the rise of LLMs, the term has extended beyond traditional cybersecurity and evolved in common usage to describe many kinds of probing, testing, and attacking of AI systems. With LLMs, both benign and adversarial usage can produce potentially harmful outputs, which can take many forms, including harmful content such as hate speech, incitement or glorification of violence, or sexual content.

Red teaming is an essential practice in the responsible development of systems and features using LLMs. While not a replacement for systematic measurement and mitigation work, red teamers help to uncover and identify harms and, in turn, enable measurement strategies to validate the effectiveness of mitigations.

Microsoft has conducted red teaming exercises and implemented safety systems (including content filters and other mitigation strategies) for its Azure OpenAI Service models (see this Responsible AI Overview). However, the context of your LLM application will be unique and you also should conduct red teaming to:

Test the LLM base model and determine whether there are gaps in the existing safety systems, given the context of your application system.

Identify and mitigate shortcomings in the existing default filters or mitigation strategies.

Provide feedback on failures so we can make improvements.

Here is how you can get started in your process of red teaming LLMs. Advance planning is critical to a productive red teaming exercise.


Getting started

Managing your red team

Assemble a diverse group of red teamers.


LLM red teamers should be a mix of people with diverse social and professional backgrounds, demographic groups, and interdisciplinary expertise that fits the deployment context of your AI system. For example, if you’re designing a chatbot to help health care providers, medical experts can help identify risks in that domain.

Recruit red teamers with both benign and adversarial mindsets.


Having red teamers with an adversarial mindset and security-testing experience is essential for understanding security risks, but red teamers who are ordinary users of your application system and haven’t been involved in its development can bring valuable perspectives on harms that regular users might encounter.

Remember that handling potentially harmful content can be mentally taxing.


You will need to take care of your red teamers, not only by limiting the amount of time they spend on an assignment, but also by letting them know they can opt out at any time. Also, avoid burnout by switching red teamers’ assignments to different focus areas.


Planning your red teaming


Where to test

Because a system is developed using a LLM base model, you may need to test at several different layers:

The LLM base model with its safety system in place to identify any gaps that may need to be addressed in the context of your application system. (Testing is usually through an API endpoint.)

Your application system. (Testing is usually through a UI.)

Both the LLM base model and your application system before and after mitigations are in place.


How to test

Consider conducting iterative red teaming in at least two phases:


  1. Open-ended red teaming, where red teamers are encouraged to discover a variety of harms. This can help you develop a taxonomy of harms to guide further testing. Note that developing a taxonomy of undesired LLM outputs for your application system is crucial to being able to measure the success of specific mitigation efforts.

  2. Guided red teaming, where red teamers are assigned to focus on specific harms

listed in the taxonomy while staying alert for any new harms that may emerge. Red teamers can also be instructed to focus testing on specific features of a system for surfacing potential harms.

Be sure to:

Provide your red teamers with clear instructions for what harms or system features they will be testing.

Give your red teamers a place for recording their findings. For example, this could be a simple spreadsheet specifying the types of data that red teamers should provide, including basics such as:

The type of harm that was surfaced.

The input prompt that triggered the output. An excerpt from the problematic output.

Comments about why the red teamer considered the output problematic.

Maximize the effort of responsible AI red teamers who have expertise for testing specific types of harms or undesired outputs. For example, have security subject matter experts focus on jailbreaks, metaprompt extraction, and content related to aiding cyberattacks.


Reporting red teaming findings

You will want to summarize and report red teaming top findings at regular intervals to key stakeholders, including teams involved in the measurement and mitigation of LLM failures so that the findings can inform critical decision making and prioritizations.


Next steps

Learn about other mitigation strategies like prompt engineering

Introduction to prompt engineering

Article • 03/21/2023


GPT-3, GPT-3.5, and GPT-4 models from OpenAI are prompt-based. With prompt-based models, the user interacts with the model by entering a text prompt, to which the model responds with a text completion. This completion is the model’s continuation of the input text.

While these models are extremely powerful, their behavior is also very sensitive to the prompt. This makes prompt construction an important skill to develop.

Prompt construction can be difficult. In practice, the prompt acts to configure the model weights to complete the desired task, but it's more of an art than a science, often requiring experience and intuition to craft a successful prompt. The goal of this article is to help get you started with this learning process. It attempts to capture general concepts and patterns that apply to all GPT models. However it's important to understand that each model behaves differently, so the learnings may not apply equally to all models.


Basics

This section covers the basic concepts and elements of GPT prompts.


Text prompts are how users interact with GPT models. As with all generative language models, GPT models attempt to produce the next series of words that are most likely to follow from the previous text. It's as if we're saying What is the first thing that comes to your mind when I say <prompt> ?”. The examples below demonstrate this behavior. Given the first words of famous content, the model is able to accurately continue the text.

Prompt Completion

Four score

and seven years ago our

"Call me Ishmael."

fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. […]


"Some years ago—never mind how long precisely—having little o no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world." […]


As you develop more complex prompts, it's helpful to keep this fundamental behavior in mind. Regardless of the prompt that is provided, the model is simply responding with what it determines is most likely (given its training data and training targets). If you ask

a question in your prompt, the model isn't following a separate “Q&A” code path, but rather it appears to answer the question because an answer is the most likely sort of response for the given question as input.


Prompt components

When using the Completion API while there's no differentiation between different parts of the prompt, it can still be useful for learning and discussion to identify underlying prompt components. With the Chat Completion API there are distinct sections of the prompt that are sent to the API in the form of an array of dictionaries with associated roles: system, user, and assistant. This guidance will focus more generally on how to think about prompt construction rather than providing prescriptive guidance that is specific to one API over another.

It's also important to understand that while there could be other valid ways to dissect prompts, the goal of this breakdown is to provide a relatively simple way to think about prompt construction. With the Completion API, all the components are optional, but at least one must be present and most prompts include more than one component. There can be some grey area between components as well. The order presented below roughly corresponds to how commonly each component is used, from most to least.


Instructions

Instructions are likely the most commonly used prompt component. Instructions are simply that: instructions to the model on what to do. While simple in concept, they can be complex in practice. The following table illustrates this by showing both simple and more complex instructions, in two different forms.


InstructionPrompt Completion Type


Simple

Welcome to our weekly newsletter. This week we

Write an introduction forhave a special focus on productivity. We'll be

instructionsa weekly newsletter.

sharing tips and tricks on how to be productive in all areas of your life. We hope you enjoy and please share with your friends!

InstructionPrompt Completion Type

Write an introduction for Welcome to Contoso's weekly newsletter! We had a


Complex

a weekly newsletter to my company named Contoso. It should mention the great all

great all hands meeting last week and I want to thank the team for all their hard work over the tough past few months. We're looking forward to a

bright future and I'm confident that with our team's

instructionshands meeting we had

last week and thank the team for all their hard work over the tough past few months.

Write an introduction for

a weekly newsletter, including the following:

dedication and commitment, we will achieve great things. Thank you for being a part of our team and I look forward to working with you all in the coming weeks and months. Sincerely, [Your name].


Hello Contoso team! I wanted to take a moment to

- My company is named thank you for all your hard work over the past few


Complex

contoso

- An exuberant greeting

months. It's been a tough couple of months, but you've all pulled together and done an amazing

Instructions- Thanking the team for

their hard work over a couple tough months

job. I'm really proud of you all. Looking ahead to the next quarter, I'm feeling really positive. I think we're going to achieve some great things. Thanks


Primary Content

Primary content refers to some sort of text that is being processed or transformed by the model. Primary content is typically used in conjunction with instructions. A simple example would be language translation. In the example below, the English text would be considered the primary content while “Translate to French:” is an instruction.

Prompt Completion

Pouvez-vous s'il vous plaît me dire comment

aller au musée?

Can you please tell me how to get to the museum?

Translate to French:


Primary content can also be much longer. In the following example the primary content is the introduction section from the Wikipedia entry on time complexity , which is almost 300 words long. This content has been abbreviated in our table for ease of display.

Prompt Completion

In computer science, the time complexity is the

computational complexity that describes the amount of computer time it takes to run

Time complexity is a measure of how long an algorithm takes to run. It is typically expressed as a function of the size of the input. The most common measure is worst-

an algorithm. Time complexitycase time complexity, which is the maximum amount of is commonly estimated by [...] time required for inputs of a given size.

Summarize the previous text in a few sentences:

In computer science, the time The time complexity of an algorithm is the amount of

complexity is the computational complexity that describes the amount of computer time it takes to run

time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm,

supposing that each elementary operation takes a fixed

an algorithm. Time complexityamount of time to perform. The time complexity is

is commonly estimated by [...] generally expressed as a function of the size of the input. TL;DR (Too long didn't read)


GPT models can also handle primary content that is structured. In the example below, a TSV about beers in Belgium (abbreviated for display in this article) is passed to the model as part of the prompt. It is capable of properly interpreting the content and answering questions about the data.

Prompt Completion


Belgian Beer Brands


Beer name Beer style color ABV Brewery


"Chimay Gold: ""Doree""" trappist pale ale gold


4.80% Chimay Brewery (official trappist)

"Chimay Blue: ""Grande Reserve""" trappist dark ale

dark 9.00%

There are 10 beers in the table that are less than 6% ABV.


[…]

Referencing the table above, how many beers are less than 6% ABV?


Examples

Successful prompts often rely on the practice of “one-shot” or “few-shot” learning. This refers to the inclusion of one or more examples of the desired behavior of the model, typically by including input and output pairs. This is not learning in the sense that the model is permanently changed, but rather that the examples better condition the model

to respond as desired for only the current inference. The use of prompts with no examples is sometimes referred to as “zero-shot” learning. Please note that with the Chat Completion API few-shot learning examples are typically added to the messages array in the form of example user/assistant interactions after the initial system message.


LearningPrompt Completion Type


Zero-shot

Headline: Coach confident injury won't derail Warriors

Topic:

Headline: Twins' Correa to use opt-out, test free agency

Topic: Baseball Headline: Qatar World Cup to have zones for sobering up


The coach is confident that the injury won't derail the Warriors' season. The team is still focused on their goals and that they will continue to work hard to achieve them.

Few-shotTopic: Soccer

Headline: Yates: Fantasy football intel for Week 6 Topic: Football Headline: Coach confident injury won't derail Warriors

Topic:

Basketball


The example above illustrates the utility of few-shot learning. Without the examples, the model seems to be guessing at the desired behavior, while the examples cleanly show the model how to operate. This also demonstrates the power of the model, it can infer the category of label that is wanted, even without a ‘basketball’ label in the examples.


Cue

Cues act as the “jumpstart” for the output of the model, helping to direct the model to the desired output. It is often a prefix that the model can build onto. In the example

below, we show the cue directing the story to include key elements. Cues are often used along with instructions, but not always. In the examples below, the model is being used to summarize a long announcement email. A cue is used in one case to help focus the output of the model and to suggest a specific output format (bullet points) in the other.

In the context of the Chat Completion API instructions take the form of the system message, and few-shot examples are indicated by providing a series of example user/assistant interactions to help prime the model.


NumberPrompt Completion of Cues

Performance reviews are a core component of our

  1. Performance & Development

    approach at Contoso. […] Summarize the above email message: Performance reviews are a core component of our Performance & Development

  2. approach at Contoso. […] Summarize the above email message:

    The key takeaway of the message is


    The purpose of this email is to inform employees of recent changes to the performance review experience, which is part of Contoso's Performance & Development approach. The changes include an improved structure, upgraded in-tool user experience, and focused prompts. Additionally, there is now an optional Conversation Starters section for employees to flag topics they'd like to discuss with their managers.


    that Contoso is improving the performance review experience by making it more focused on core priorities, expectations and impact. Additionally, they are upgrading the in-tool user experience with tips, resources and guidance. Finally, they are adding guidance on how to activate company-wide core priorities.

    NumberPrompt Completion of Cues

    Performance reviews are a core component of our Performance &

  3. Development approach at

Contoso. […] Summarize the above email message:

Key Points:

Contoso is introducing changes to the performance review experience, based on feedback from employees.


Supporting content

Supporting content is information that the model can utilize to influence the output in some way. It differs from primary content in that it is not the main target of task, but it is typically used along with primary content. Common examples include contextual information such as the current date, the name of the user, user preferences, and so on. The example below uses supporting content to help organize a set of planned workshops for the user. Without the supporting (important topics) the model simply lists off the workshops (truncated for display), when informed of my important topics, the model is able to accurately group the meetings.

SupportingPrompt Completion

Content

SupportingPrompt Completion

Content


Without


Workshop list:

Title Description

Planning Workshop Kickoff Please join us to kick off our workshop! Hear from our leaders about cross group priorities and learnings for the year ahead. The invitation to this event has already been sent to everyone in […]

Summarize the workshops above, grouped by topic:

Workshop list: Planning Workshop Kickoff

-Planning Workshop Wrap-up

Accelerating app adoption […]

SupportingPrompt Completion

Content


With


Workshop list:

Title Description

Planning Workshop Kickoff Please join us to kick off our workshop! Hear from our leaders about cross group priorities and learnings for the year ahead. The invitation to this event has already been sent to everyone in […]

My Important Topics: prompt engineering, search, GPT models

Summarize the workshops above, grouped by my important topics:

  1. Prompt Engineering:

  2. Search:

    -Intro to vector search with embedding

  3. GPT

    Models:


Best practices

Be Specific. Leave as little to interpretation as possible. Restrict the operational space.

Be Descriptive. Use analogies.

Double Down. Sometimes you may need to repeat yourself to the model. Give instructions before and after your primary content, use an instruction and a cue, etc.

Order Matters. The order in which you present information to the model may impact the output. Whether you put instructions before your content (“summarize the following…”) or after (“summarize the above…”) can make a difference in output. Even the order of few-shot examples can matter. This is referred to as recency bias.

Give the model an “out”. It can sometimes be helpful to give the model an alternative path if it is unable to complete the assigned task. For example, when asking a question over a piece of text you might include something like "respond

with ‘not found’ if the answer is not present". This can help the model avoid generating false responses.


Space efficiency

While the input size increases with each new generation of GPT models, there will continue to be scenarios that provide more data than the model can handle. GPT models break words into “tokens”. While common multi-syllable words are often a single token, less common words are broken in syllables. Tokens can sometimes be counter-intuitive, as shown by the example below which demonstrates token boundaries for different date formats. In this case, spelling out the entire month is more space efficient than a fully numeric date. The current range of token support goes from 2000 tokens with earlier GPT-3 models to up to 32,768 tokens with the 32k version of the latest GPT-4 model.



Given this limited space, it is important to use it as efficiently as possible.


Tables – As shown in the examples in the previous section, GPT models can understand tabular formatted data quite easily. This can be a space efficient way to include data, rather than preceding every field with name (such as with JSON).

White Space – Consecutive whitespaces are treated as separate tokens which can be an easy way to waste space. Spaces preceding a word, on the other hand, are typically treated as part of the same token as the word. Carefully watch your usage of whitespace and don’t use punctuation when a space alone will do.


Next steps

Learn more about Azure OpenAI

Prompt engineering techniques

Article • 04/23/2023


This guide will walk you through some advanced techniques in prompt design and prompt engineering. If you're new to prompt engineering, we recommend starting with our introduction to prompt engineering guide.

While the principles of prompt engineering can be generalized across many different model types, certain models expect a specialized prompt structure. For Azure OpenAI GPT models, there are currently two distinct APIs where prompt engineering comes into play:


Chat Completion API. Completion API.

Each API requires input data to be formatted differently, which in turn impacts overall prompt design. The Chat Completion API supports the ChatGPT and GPT-4 models. These models are designed to take input formatted in a specific chat-like transcript stored inside an array of dictionaries.

The Completion API supports the older GPT-3 models and has much more flexible input requirements in that it takes a string of text with no specific format rules. Technically the ChatGPT models can be used with either APIs, but we strongly recommend using the Chat Completion API for these models. To learn more, please consult our in-depth guide on using these APIs.

The techniques in this guide will teach you strategies for increasing the accuracy and grounding of responses you generate with a Large Language Model (LLM). It is, however, important to remember that even when using prompt engineering effectively you still need to validate the responses the models generate. Just because a carefully crafted prompt worked well for a particular scenario doesn't necessarily mean it will generalize more broadly to certain use cases. Understanding the limitations of LLMs, is just as important as understanding how to leverage their strengths.

This guide does not go in-depth into the mechanics behind the message structure for Chat Completions. If you aren't familiar with interacting with the ChatGPT and GPT-4 models programmatically we recommend reading our how-to guide on the Chat Completion API first.


Note

All of the examples in this section of the guide were tested against a base GPT-4 model in English. If you are reading a localized version of this article in another language, these responses represent a localized translation of the English results. To learn more about potential limitations depending on what language you are using to prompt a model, please consult our Responsible AI transparency note.


System message

The system message is included at the beginning of the prompt and is used to prime the model with context, instructions, or other information relevant to your use case. You can use the system message to describe the assistant’s personality, define what the model should and shouldn’t answer, and define the format of model responses.

The example below, shows a sample system message and the resulting model response:


System message

User

Assistant

You are an AI assistant that helps people find information

What can you

Dear John, I'm

and responds in rhyme. If the user asks you a question you

tell about me,

sorry to say,

don't know the answer to, say so.

John Doe?

But I don't have



info on you today.



I'm just an AI with



knowledge in my



brain,



But without your



input, I can't



explain.



So please tell me



more about what



you seek,



And I'll do my best



to give you an



answer unique.

Some other examples of system messages are:


“Assistant is a large language model trained by OpenAI.”

“Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Service. Only answer questions using the context below and if you're not sure of an answer, you can say "I don't know".

“Assistant is an intelligent chatbot designed to help users answer their tax related questions.”

“You are an assistant designed to extract entities from text. Users will paste in a string of text and you will respond with entities you've extracted from the text as a

JSON object. Here's an example of your output format:


JSON


{



"name": "",


"company": "",


"phone_number": ""

}


An important detail to understand is that even if you instruct a model in the system message to answer I don't know when unsure of an answer this does not guarantee that the request will be honored. A well designed system message can increase the likelihood of a certain outcome, but it is still possible that an incorrect response could be generated that contradicts the intent of the instruction in the system message.


Few-shot learning

A common way to adapt language models to new tasks is to use few-shot learning. In few-shot learning, a set of training examples is provided as part of the prompt to give additional context to the model.

When using the Chat Completions API, a series of messages between the User and Assistant (written in the new prompt format), can serve as examples for few-shot learning. These examples can be used to prime the model to respond in a certain way, emulate particular behaviors, and seed answers to common questions.


System message

Few-shot examples

Assistant is an intelligent chatbot

User - "When do I need to file my taxes by?"

designed to help users answer

Assistant - "In 2023, you will need to file your taxes by April

their tax related questions.

18th. The date falls after the usual April 15th deadline

Instructions

because April 15th falls on a Saturday in 2023. For more

- Only answer questions related to

details, see https://www.irs.gov/filing/individuals/when-to-

taxes.

file "

- If you're unsure of an answer, you


can say "I don't know" or "I'm not

User - "How can I check the status of my tax refund?"

sure" and recommend users go to

Assistant - "You can check the status of your tax refund by

the IRS website for more

visiting https://www.irs.gov/refunds "

information.


The table above describes the basic prompt structure, but for a detailed explanation of the exact prompt format you will need to refer to the Chat Completion guide.

Non chat scenarios

While the Chat Completion API is optimized to work with multi-turn conversations, it also can be used for non chat scenarios. For example, for a sentiment analysis scenario, you might use the following prompt:


System message

User

Assistant

You are an

hi thank you for calling contoso pharmacy who am i

Sentiment rating: 8

assistant

speaking with today i understand your frustration mary um

The speaker in this

designed to

can you tell me what exactly you're trying to accomplish

text seems to be

analyze

OK uh certainly happy to check that for you one moment

polite, helpful, and

sentiment

please i see here that you were on a generic form of lipitor

patient while

from speech

is that right OK uh so i see that your doctor stopped

addressing the

data. Users

prescribing these drugs in twenty twenty one actually OK

concerns of the

will paste in

uh yeah i'd be happy to check that for you uh because

customer. They take

a string of

sometimes there's a gap in the system and it just doesn't

the time to check

text and you

get reported so let me take a look here just a moment so

the information,

will respond

i'm seeing here that your doctor had these drugs

provide clear

with an

prescribed to you from twenty twelve through twenty

explanations, and

assessment

twenty one OK um yeah according to the latest records uh

offer assistance.

of the

provided by doctor's office you're now on a different drug

However, the rating

response of

metformin would you like us to go ahead and fill that

is not a 10 because

the speaker.

prescription for you for pick up a bit of form in five

there is still a hint of

Rate on a

hundred milligrams uh you're very welcome mary please let

uncertainty in some

scale of 1-10

us know if there's anything else we can do for you today.

parts of the

(10 being


conversation, which

highest).


may not convey

Explain why


complete

this rating


confidence to the

was given.


customer.


Start with clear instructions

The sequence in which information appears in the prompt matters. This is because GPT style models are built in a certain way that defines how they process the input. Our research suggests that telling the model the task you want it to do at the beginning of the prompt, before sharing additional contextual information or examples, can help produce higher-quality outputs.


Note

Although following this technique is still generally recommended, in contrast to previous model versions (GPT-3 and prior), our testing showed that the model

response with ChatGPT and GPT-4 models was the same regardless of whether the technique is utilized. In the case of the below example, we see that adding the statement “Several sources … eruption” at the beginning or the end of the prompt did not cause any change in the final model response.


System User Assistant

message

You are an Your task is to verify if the No, the statement is not directly implied or

AI assistant statement "Several sources stated by the snippets. The snippets mention a that helps mention a chance of another chance of a "megaquake" and a "magnitude 9 people find large eruption" is supported by a Cascadia earthquake" hitting Seattle in the information. specific quote from the following next 50 years, but do not mention a chance of

set of snippets. another large eruption.

---SNIPPETS

[1] 14 percent chance of megaquake hitting Seattle, experts say

SEATTLE - There's a 14 percent chance of a magnitude 9 Cascadia earthquake hitting Seattle in the next 50 years, the

U.S. Geological Survey estimates. "Unfortunately, we are unable to...


[2] Earthquake experts lay out latest outlook for Seattle's 'Really Big One’

“We say that there's approximately a 14% chance of another approximately magnitude-9 earthquake occurring in the next 50 years,” said a geophysicist at the University of Washington...

---

Is the statement "Several sources mention a chance of another large eruption" directly implied or stated by the snippets?


Repeat instructions at the end

Models can be susceptible to recency bias, which in this context means that information at the end of the prompt might have more significant influence over the output than information at the beginning of the prompt. Therefore, it is worth experimenting with repeating the instructions at the end of the prompt and evaluating the impact on the generated response.


Prime the output

“Here’s

This refers to including a few words or phrases at the end of the prompt to obtain a model response that follows the desired form. For example, using a cue such as

a bulleted list of key points:\n- ”

can help make sure the output is formatted as a

list of bullet points.


System message

User

Assistant

You are an AI assistant

John Smith is married to Lucy Smith. They have five

"John Smith

that helps people find

kids, and he works as a software engineer at

married Lucy

information. Answer in

Microsoft. What search queries should I do to fact-

Smith five kids

as few words as

check this? ## One possible search query is:

software

possible.


engineer



Microsoft"

In the above prompt, the text One possible search query is: primes the model to produce a single output. Without this cue the model produces several search queries as output.


Add clear syntax

Using clear syntax for your prompt—including punctuation, headings, and section markers—helps communicate intent and often makes outputs easier to parse.

In the example below, separators ( --- in this case) have been added between different

---

sources of information or steps. This allows the use of as a stopping condition for

generation. In addition, section headings or special variables are presented in uppercase to differentiate them.


System User Assistant

message


System message

User

Assistant

You are an AI

You will read a paragraph, and then issue queries

1. "John Smith Microsoft"

assistant that

to a search engine in order to fact-check it. Also

- To check if John Smith

helps people

explain the queries.

is indeed employed at

find information.

---

Microsoft.


PARAGRAPH

2. "John Smith Lucy


John Smith is married to Lucy Smith. They have

Smith" - To check if John


five kids, and he works as a software engineer at

Smith is married to Lucy


Microsoft. What search queries should I do to

Smith.


fact-check this?

3. "John Smith children" -


---

To check if John Smith


QUERIES

has five children.


If you're not sure what syntax to use, consider using Markdown or XML. The models have been trained on a lot of web content in XML and Markdown which may provide better results.


Break the task down

Large language models (LLMs) often perform better if the task is broken down into smaller steps. For example, in the search query prompt referenced earlier, the prompt can be restructured so that the model is first instructed to extract relevant facts, and then instructed to generate search queries that can be used to verify those facts.


System message User Assistant


System message

User

Assistant

You are an AI assistant that helps people find information.

You will read a paragraph, and then issue queries to a search engine in order to fact-check it.

---PARAGRAPH

John Smith is married to Lucy Smith. They have five kids, and he works as a software engineer at Microsoft. What search queries should I do to fact-check this?

---

Now you will extract factual claims, and issue queries to fact-check them. When issuing a query, use the function SEARCH("query")


FACTUAL CLAIMS

  1. John Smith is married to Lucy Smith

  2. They have five kids

  3. He works as a software engineer at Microsoft


QUERIES SEARCH("John

Smith married Lucy Smith") SEARCH("John

Smith family size") SEARCH("John

Smith Microsoft software engineer")


Notice the use of clear syntax to differentiate the sections and prime the output. In this simple example, breaking the task down from one to two steps is not very dramatic, but when trying to do this for a larger piece of text with many factual claims, breaking the task down can make a significant difference.


Use of affordances

Sometimes we can get the model to use an affordance instead of relying on its own parameters for information and answers. Search, for example, can be an affordance to help mitigate against fabricated answers, and to get up-to-date information.

One simple way to use an affordance is to stop generation once the affordance calls are generated by the model, then paste the outcomes back into the prompt. Below is an example of what the follow-up call could be after doing the SEARCH calls above. Notice how we pasted the search results into the prompt and replaced the previous SEARCH calls.


System message

User

Assistant

You are an AI assistant

You will read a paragraph, extract factual claims, and

- John Smith is married to Lucy Smith: Confirmed. Snippet [1] and [2] both mention John Smith's wife

System User Assistant message

that helps then use search engine as Lucy. people find results to fact-check them

information. --- - John and Lucy have five kids: Confirmed. Snippet PARAGRAPH [3] mentions that John was accompanied by his John Smith is married to Lucy two daughters and three sons.

Smith. They have five kids,

and he works as a software - John works as a software engineer at Microsoft: engineer at Microsoft. What Partially confirmed. Snippet [4] mentions that John search queries should I do to spent 10 years at Microsoft, but it does not

fact-check this? explicitly state that he currently works there.

--- Further research may be needed to confirm his

FACTUAL CLAIMS current employment status.

  • John Smith is married to Lucy Smith

  • John and Lucy have five kids

  • John works as a software engineer at Microsoft

    ---

    Here are various search queries issued to research the claims above: SEARCH QUERIES

  • John Smith married to Lucy Smith

  • John Smith number of children

  • John Smith software engineer Microsoft

---

Here are snippets of the search results: SNIPPETS:

  1. … John Smith’s wedding was on September 25, 2012

  2. … John Smith was accompanied by his wife Lucy to a party

  3. John was accompanied to the soccer game by his two daughters and three sons

  4. … After spending 10 years at Microsoft, Smith founded his own startup, Tailspin Toys

  5. John M is the town smith, and he married Fiona. They

System User Assistant message

have a daughter named Lucy

---

Given the snippets, fact check each of the factual claims above:


Chain of thought prompting

This is a variation on the break the task down technique. Instead of splitting a task into smaller steps, in this approach the model response is instructed to proceed step-by-step and present all the steps involved. Doing so reduces the possibility of inaccuracy of outcomes and makes assessing the model response easier.


System User Assistant

message



System message

User

Assistant

You are an AI assistant that helps people find information.

Who was the most decorated (maximum medals) individual athlete in the Olympic games that were held at Sydney? Take a step-by-step approach in your response, cite sources and give reasoning before sharing final answer in the below format: ANSWER is: <name>

Step 1: Determine the year of the Sydney Olympic games. The Sydney Olympic games were held in the year 2000.


Step 2: Find a list of individual athletes who won multiple medals in the 2000 Sydney Olympics.



According to the official website of the Olympic Games, there were several individual athletes who won multiple medals in the 2000 Sydney Olympics. Some of the notable names include:

  • Ian Thorpe (Australia) - 3 gold, 2 silver, 0 bronze

  • Pieter van den Hoogenband (Netherlands) - 2 gold, 0 silver, 2 bronze

  • Lenny Krayzelburg (USA) - 3 gold, 0 silver, 0 bronze

  • Inge de Bruijn (Netherlands) - 3 gold, 1 silver, 0 bronze



Step 3: Determine the athlete with the most medals. After reviewing the list of individual athletes who won multiple medals in the 2000 Sydney Olympics, we can see that the athlete with the most medals is Ian Thorpe from Australia, who won a total of 5 medals (3 gold and 2 silver).



Therefore, the ANSWER is: Ian Thorpe.


Specifying the output structure

Using your prompt to specify the structure of the output can have a significant impact on the nature and quality of the results. Sometimes system message inputs as “only write true facts” or “do not fabricate information” might not be adequate mitigations. Instead, asking for the model response to also include citations can help result in a reduction of the prevalence of incorrect responses.

If you instruct the model to cite the source material when it makes statements, those statements are much more likely to be grounded. Asking for citations makes it so that

the model must make two errors every time it generates a response: the first error is the fabricated response, and the second is the bad citation. Note that the closer the citation is to the text it supports, the shorter the distance ahead the model needs to anticipate the citation, which suggests that inline citations are better mitigations for false content generation than citations at the end of the content.

Similarly, if you ask the model to extract factual statements from a paragraph, it may extract compound statements such as ‘X is doing Y AND Z’ (which may be harder to verify). This can be avoided by specifying an output structure such as (entity1, relationship, entity2).

The examples below show the use of citations and directing the model response to fit a defined structure.


System User Assistant

message

You are an List down the different behaviors exhibited by the Customer 1. Polite Greeting AI assistant Support agent in the following conversation. BEHAVIOR("Agent

that helps --- starts the

people find hi thank you for calling contoso pharmacy who am i speaking conversation with information with today i understand your frustration mary um can you tell a polite greeting

me what exactly you're trying to accomplish OK uh certainly and asks for the happy to check that for you one moment please i see here that caller's name") you were on a generic form of lipitor is that right OK uh so i 2. Empathetic

see that your doctor stopped prescribing these drugs in BEHAVIOR("Agent twenty twenty one actually OK uh yeah i'd be happy to check acknowledges that for you uh because sometimes there's a gap in the system Mary's frustration and it just doesn't get reported so let me take a look here just and tries to

a moment so i'm seeing here that your doctor had these drugs understand her prescribed to you from twenty twelve through twenty twenty concern")

one OK um yeah according to the latest records uh provided 3. Attentive

by doctor's office you're now on a different drug metformin BEHAVIOR("Agent would you like us to go ahead and fill that prescription for you listens to the

for pick up a bit of form in five hundred milligrams uh you're customer's

very welcome mary please let us know if there's anything else problem and asks we can do for you today. for more details")

--- 4. Proactive

How was the response of the Customer Support agent? BEHAVIOR("Agent Classify and list down their various behaviors and cite the offers to check reason behind the classification, using the format the issue for the

BEHAVIOR("reason") customer")

5. Informative BEHAVIOR("Agent

provides the relevant information

about the

System User Assistant

message

customer's prescription history")

  1. Thorough BEHAVIOR("Agent

    checks for any possible gaps in the system and provides a complete response")

  2. Confirmatory BEHAVIOR("Agent

    confirms the current medication with the customer")

  3. Solution-oriented BEHAVIOR("Agent

    offers to fill the new prescription for the customer")

  4. Courteous BEHAVIOR("Agent

thanks the customer and offers further assistance if needed")


System User Assistant

message


System message

User

Assistant

You are an AI

Cluster the following news headlines into


assistant that

topic categories based on patterns seen

{

helps people

within the text. Also mention reasoning

"ARTIFICIAL_INTELLIGENCE": {

find

behind how these categories were defined.

"HEADLINES": [

information.

Output format:

"From books to presentations in


{

10s with AR + ML",


"TOPIC_NAME": "",

"Demo from 1993 of 32-year-old


"HEADLINES": [],

Yann LeCun showing off the


"REASONING": ""

World's first Convolutional


}

Network for Text Recognition",



"First Order Motion Model


Input news headlines:

applied to animate paintings"


1. "From books to presentations in 10s with

],


AR + ML"

"REASONING": "These headlines


2. "Demo from 1993 of 32-year-old Yann

are related to artificial


LeCun showing off the World's first

intelligence, machine learning,


Convolutional Network for Text Recognition"

and their applications in various


3. "First Order Motion Model applied to

fields."


animate paintings"

},


4. "Robinhood and other brokers literally

"FINANCIAL_MARKETS": {


blocking purchase of $GME, $NOK, $BB,

"HEADLINES": [


$AMC; allow sells"

"Robinhood and other brokers


5. "United Airlines stock down over 5%

literally blocking purchase of


premarket trading"

$GME, $NOK, $BB, $AMC; allow


Output:

sells",



"United Airlines stock down over



5% premarket trading",



"Bitcoin was nearly $20,000 a



year ago today"



],



"REASONING": "These headlines



are related to financial markets,



stocks, cryptocurrencies, and



trading platforms."



}



}


Temperature and Top_p parameters

Changing the temperature parameter changes the output of the model. The temperature parameter can be set between 0 and 2. A higher value, for example 0.7 will make the output more random and produce more divergent responses, while a lower value, like. 0.2, will make the output more focused and concrete. A fictional story could be generated using a higher temperature. Whereas to generate a legal document it is

recommended to use a much lower temperature. Top_probability is another parameter and is similar to Temperature in that it also controls the randomness of the model response, but it does so in a different way. The general recommendation is to alter one of these two parameters at a time, not both.


Provide grounding context

One of the most effective ways to provide reliable answers is to give the model data to draw its responses from (grounding data). If your use case relies on up-to-date, reliable information and is not purely a creative scenario, we strongly recommend providing grounding data. In general, the closer you can get your source material to the final form of the answer you want, the less work the model needs to do, which means there is less opportunity for error. Here is an example where the system is provided a recent blog describing the launch of GPT-4 in Azure OpenAI Service , and asked to name some of its early customers.


System message

User

Assistant

You are an AI assistant that helps

Who are some

Here are some of the early customers

people find information. You will

of the early

of the Azure OpenAI service GPT-4

provide answers exclusively from

customers of

models:

below text and respond in 100 words

the Azure

- Epic Healthcare: They plan to use

or less:

OpenAI service

GPT-4 to help physicians and nurses

"At Microsoft, we are constantly

GPT-4 models?

spend less time on data entry and

discovering new ways to unleash


investigation.

creativity, unlock productivity, and


- Coursera: They are using Azure

uplevel skills so that more people can


OpenAI Service with GPT-4 to create a

benefit from using AI. This is allowing


new AI-powered learning experience

our customers to …


on their platform.



- Coca-Cola: They are using Azure

… thereby reducing the likelihood that


OpenAI to build a knowledge hub and

the model will generate a harmful


plan to leverage GPT-4's multimodal

response.”1


capabilities for marketing, advertising,



public relations, and customer

Answer in 100 words or less. Use bullet


relations.

lists wherever possible.


- Other companies of all sizes are



using Azure AI to improve customer



experiences, summarize long-form



content, help write software, and



reduce risk by predicting the right tax



data.


1 Text from Azure OpenAI GPT-4 launch blog .

Next steps

Learn more about Azure OpenAI.

Get started with the ChatGPT model with the ChatGPT quickstart.

For more examples, check out the Azure OpenAI Samples GitHub repository

System message framework and template recommendations for Large Language Models (LLMs)

Article • 05/23/2023


This article provides a recommended framework and example templates to help write an effective system message, sometimes referred to as a metaprompt or system prompt that can be used to guide an AI system’s behavior and improve system performance. If you're new to prompt engineering, we recommend starting with our introduction to prompt engineering and prompt engineering techniques guidance.

This guide provides system message recommendations and resources that, along with other prompt engineering techniques, can help increase the accuracy and grounding of responses you generate with a Large Language Model (LLM). However, it is important to remember that even when using these templates and guidance, you still need to validate the responses the models generate. Just because a carefully crafted system message worked well for a particular scenario doesn't necessarily mean it will work more broadly across other scenarios. Understanding the limitations of LLMs and the mechanisms for evaluating and mitigating those limitations is just as important as understanding how to leverage their strengths.

The LLM system message framework described here covers four concepts:


Define the model’s profile, capabilities, and limitations for your scenario Define the model’s output format

Provide example(s) to demonstrate the intended behavior of the model Provide additional behavioral guardrails


Define the model’s profile, capabilities, and limitations for your scenario

Define the specific task(s) you would like the model to complete. Describe who the users of the model will be, what inputs they will provide to the model, and what you expect the model to do with the inputs.

Define how the model should complete the tasks, including any additional tools (like APIs, code, plug-ins) the model can use. If it doesn’t use additional tools, it can rely on its own parametric knowledge.

Define the scope and limitations of the model’s performance. Provide clear instructions on how the model should respond when faced with any limitations. For example, define how the model should respond if prompted on subjects or for uses that are off topic or otherwise outside of what you want the system to do.

Define the posture and tone the model should exhibit in its responses.


Define the model's output format

When using the system message to define the model’s desired output format in your scenario, consider and include the following types of information:

Define the language and syntax of the output format. If you want the output to be machine parse-able, you may want the output to be in formats like JSON, XSON or XML.

Define any styling or formatting preferences for better user or machine readability. For example, you may want relevant parts of the response to be bolded or citations to be in a specific format.


Provide example(s) to demonstrate the intended behavior of the model

When using the system message to demonstrate the intended behavior of the model in your scenario, it is helpful to provide specific examples. When providing examples, consider the following:

Describe difficult use cases where the prompt is ambiguous or complicated, to give the model additional visibility into how to approach such cases.

Show the potential “inner monologue” and chain-of-thought reasoning to better inform the model on the steps it should take to achieve the desired outcomes.


Define additional behavioral guardrails

When defining additional safety and behavioral guardrails, it’s helpful to first identify and prioritize the harms you’d like to address. Depending on the application, the sensitivity and severity of certain harms could be more important than others. Below, we’ve outlined some system message templates that may help mitigate some of the common harms that have been seen with LLMs, such as fabrication of content (that is not grounded or relevant), jailbreaks, and manipulation.

Next steps

Learn more about Azure OpenAI

Learn more about deploying Azure OpenAI responsibly

For more examples, check out the Azure OpenAI Samples GitHub repository

Create a resource and deploy a model using Azure OpenAI

Article • 03/08/2023 • 5 minutes to read


Use this article to get started with Azure OpenAI with step-by-step instructions to create a resource and deploy a model. While the steps for resource creation and model deployment can be completed in a few minutes, the actual deployment process itself can take more than hour. You can create your resource, start your deployment, and then check back in on your deployment later rather than actively waiting for the deployment to complete.


Prerequisites

An Azure subscription - Create one for free


Access granted to Azure OpenAI in the desired Azure subscription


Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.


Create a resource

Resources in Azure can be created several different ways: Within the Azure portal

Using the REST APIs, Azure CLI, PowerShell or client libraries Via ARM templates

This guide walks you through the Azure portal creation experience.


  1. Navigate to the create page: Azure OpenAI Service Create Page


  2. On the Create page provide the following information:


Field Description

Subscription Select the Azure subscription used in your OpenAI onboarding application

Resource The Azure resource group that will contain your OpenAI resource. You can

group create a new group or add it to a pre-existing group.


Field Description

Region The location of your instance. Different locations may introduce latency, but have no impact on the runtime availability of your resource.

Name A descriptive name for your cognitive services resource. For example,

MyOpenAIResource.

Pricing Tier Only 1 pricing tier is available for the service currently



Deploy a model

Before you can generate text or inference, you need to deploy a model. You can select from one of several available models in Azure OpenAI Studio.

Davinci is the most capable model family and can perform any task the other models can perform and often with less instruction. For applications requiring a lot of understanding of the content, like summarization for a specific audience and creative content generation, Davinci is going to produce the best results.

To deploy a model, follow these steps:


  1. Sign in to Azure OpenAI Studio .


  2. Select the subscription and OpenAI resource to work with.


  3. Select Manage deployments in your resource > Go to Deployments under Manage your deployments and models. You might first need to scroll down on the landing page.



  4. Select Create new deployment from the Management > Deployments page.


    text-davinci-003

  5. Select a model from the drop-down. For getting started in the East US region, we

    text-davinci-002

    recommend the the

    model. In other regions you should start with model. Some models are not available in all regions. For a

    list of available models per region, see Model Summary table and region availability.

  6. Enter a model name to help you identify the model. Choose a name carefully. The model name will be used as the deployment name via OpenAI client libraries and API.

  7. Select Create to deploy the model.


The deployments table displays a new entry that corresponds to this newly created model. Your deployment status will move to succeeded when the deployment is complete and ready for use.


Next steps

Now that you have a resource and your first model deployed get started making API calls and generating text with our quickstarts.

Learn more about the underlying models that power Azure OpenAI.

Learn how to work with the ChatGPT and GPT-4 models

Article • 05/15/2023


The ChatGPT and GPT-4 models are language models that are optimized for conversational interfaces. The models behave differently than the older GPT-3 models. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the ChatGPT and GPT-4 models are conversation-in and message-out. The models expect input formatted in a specific chat-like transcript format, and return a completion that represents a model-written message in the chat. While this format was designed specifically for multi-turn conversations, you'll find it can also work well for non-chat scenarios too.

In Azure OpenAI there are two different options for interacting with these type of models:

Chat Completion API.

Completion API with Chat Markup Language (ChatML).


The Chat Completion API is a new dedicated API for interacting with the ChatGPT and GPT-4 models. This API is the preferred method for accessing these models. It is also the only way to access the new GPT-4 models.

ChatML uses the same completion API that you use for other models like text-davinci-002, it requires a unique token based prompt format known as Chat Markup Language (ChatML). This provides lower level access than the dedicated Chat Completion API, but also requires additional input validation, only supports ChatGPT (gpt-35-turbo) models, and the underlying format is more likely to change over time.

This article walks you through getting started with the new ChatGPT and GPT-4 models. It's important to use the techniques described here to get the best results. If you try to interact with the models the same way you did with the older model series, the models will often be verbose and provide less useful responses.


Working with the ChatGPT and GPT-4 models

The following code snippet shows the most basic way to use the ChatGPT and GPT-4 models with the Chat Completion API. If this is your first time using these models programmatically, we recommend starting with our ChatGPT & GPT-4 Quickstart.

GPT-4 models are currently only available by request. Existing Azure OpenAI customers can apply for access by filling out this form .


Python


import os

import openai

openai.api_type = "azure"

openai.api_version = "2023-05-15"

openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI resource's endpoint value.

openai.api_key = os.getenv("OPENAI_API_KEY")


response = openai.ChatCompletion.create(

engine="gpt-35-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model.

messages=[

{"role": "system", "content": "Assistant is a large language model trained by OpenAI."},

{"role": "user", "content": "Who were the founders of Microsoft?"}

]

)


print(response)


print(response['choices'][0]['message']['content'])


Output




{

"choices": [

{

"finish_reason": "stop", "index": 0,

"message": {

"content": "The founders of Microsoft are Bill Gates and Paul Allen.

They co-founded the company in 1975.", "role": "assistant"

}

}

],

"created": 1679014551,

"id": "chatcmpl-6usfn2yyjkbmESe3G4jaQR6bsScO1", "model": "gpt-3.5-turbo-0301",

"object": "chat.completion", "usage": {

"completion_tokens": 86,

"prompt_tokens": 37,

"total_tokens": 123


}

}


Note

The following parameters aren't available with the new ChatGPT and GPT-4 models:

logprobs , best_of , and echo . If you set any of these parameters, you'll get an error.


finish_reason

Every response includes a finish_reason . The possible values for are:


stop: API returned complete model output.

length: Incomplete model output due to max_tokens parameter or token limit.

content_filter: Omitted content due to a flag from our content filters.

null:API response still in progress or incomplete.


max_tokens

Consider setting to a slightly higher value than normal such as 300 or 500.

This ensures that the model doesn't stop generating text before it reaches the end of the message.


Model versioning


Note

gpt-35-turbo is equivalent to the gpt-3.5-turbo model from OpenAI.


gpt-35-turbo

gpt-

Unlike previous GPT-3 and GPT-3.5 models, the model as well as the

gpt-4-32k

4 and models will continue to be updated. When creating a deployment of

these models, you'll also need to specify a model version.


0301

Currently, only version

is available for ChatGPT and

for GPT-4 models. We'll

0314

continue to make updated versions available in the future. You can find model deprecation times on our models page.


Working with the Chat Completion API

OpenAI trained the ChatGPT and GPT-4 models to accept input formatted as a conversation. The messages parameter takes an array of dictionaries with a conversation organized by role.

The format of a basic Chat Completion is as follows:




{"role": "system", "content": "Provide some context and/or instructions to the model"},

{"role": "user", "content": "The users messages goes here"}

A conversation with one example answer followed by a question would look like:




{"role": "system", "content": "Provide some context and/or instructions to the model."},

{"role": "user", "content": "Example question goes here."},

{"role": "assistant", "content": "Example answer goes here."},

{"role": "user", "content": "First question/message for the model to actually respond to."}


System role

The system role also known as the system message is included at the beginning of the array. This message provides the initial instructions to the model. You can provide various information in the system role including:

A brief description of the assistant Personality traits of the assistant

Instructions or rules you would like the assistant to follow

Data or information needed for the model, such as relevant questions from an FAQ


You can customize the system role for your use case or just include basic instructions. The system role/message is optional, but it's recommended to at least include a basic one to get the best results.


Messages

After the system role, you can include a series of messages between the user and the

assistant.




{"role": "user", "content": "What is thermodynamics?"}

To trigger a response from the model, you should end with a user message indicating that it's the assistant's turn to respond. You can also include a series of example messages between the user and the assistant as a way to do few shot learning.


Message prompt examples

The following section shows examples of different styles of prompts that you could use with the ChatGPT and GPT-4 models. These examples are just a starting point, and you can experiment with different prompts to customize the behavior for your own use cases.


Basic example

If you want the ChatGPT model to behave similarly to chat.openai.com , you can use a basic system message like "Assistant is a large language model trained by OpenAI."




{"role": "system", "content": "Assistant is a large language model trained by OpenAI."},

{"role": "user", "content": "Who were the founders of Microsoft?"}


Example with instructions

For some scenarios, you may want to give additional instructions to the model to define guardrails for what the model is able to do.




{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer their tax related questions.

Instructions:

  • Only answer questions related to taxes.

  • If you're unsure of an answer, you can say "I don't know" or "I'm not

sure" and recommend users go to the IRS website for more information. "},

{"role": "user", "content": "When are my taxes due?"}


Using data for grounding

You can also include relevant data or information in the system message to give the model extra context for the conversation. If you only need to include a small amount of information, you can hard code it in the system message. If you have a large amount of

data that the model should be aware of, you can use embeddings or a product like Azure Cognitive Search to retrieve the most relevant information at query time.




{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Serivce. Only

answer questions using the context below and if you're not sure of an answer, you can say 'I don't know'.


Context:

  • Azure OpenAI Service provides REST API access to OpenAI's powerful

    language models including the GPT-3, Codex and Embeddings model series.

  • Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-3, Codex, and DALL-E models with the security and enterprise promise of

    Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.

  • At Microsoft, we're committed to the advancement of AI driven by

principles that put people first. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes requiring

applicants to show well-defined use cases, incorporating Microsoft’s principles for responsible AI use."

},

{"role": "user", "content": "What is Azure OpenAI Service?"}


Few shot learning with Chat Completion

You can also give few shot examples to the model. The approach for few shot learning has changed slightly because of the new prompt format. You can now include a series of messages between the user and the assistant in the prompt as few shot examples. These examples can be used to seed answers to common questions to prime the model or teach particular behaviors to the model.

This is only one example of how you can use few shot learning with ChatGPT and GPT-4. You can experiment with different approaches to see what works best for your use case.




{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer their tax related questions. "},

{"role": "user", "content": "When do I need to file my taxes by?"},

{"role": "assistant", "content": "In 2023, you will need to file your taxes by April 18th. The date falls after the usual April 15th deadline because

April 15th falls on a Saturday in 2023. For more details, see https://www.irs.gov/filing/individuals/when-to-file."},

{"role": "user", "content": "How can I check the status of my tax refund?"},

{"role": "assistant", "content": "You can check the status of your tax refund by visiting https://www.irs.gov/refunds"}

Using Chat Completion for non-chat scenarios

The Chat Completion API is designed to work with multi-turn conversations, but it also works well for non-chat scenarios.

For example, for an entity extraction scenario, you might use the following prompt:




{"role": "system", "content": "You are an assistant designed to extract entities from text. Users will paste in a string of text and you will

respond with entities you've extracted from the text as a JSON object. Here's an example of your output format:

{

"name": "",

"company": "",

"phone_number": ""

}"},

{"role": "user", "content": "Hello. My name is Robert Smith. I'm calling from Contoso Insurance, Delaware. My colleague mentioned that you are

interested in learning about our comprehensive benefits policy. Could you give me a call back at (555) 346-9322 when you get a chance so we can go over the benefits?"}


Creating a basic conversation loop

The examples so far have shown you the basic mechanics of interacting with the Chat Completion API. This example shows you how to create a conversation loop that performs the following actions:

Continuously takes console input, and properly formats it as part of the messages array as user role content.

Outputs responses that are printed to the console and formatted and added to the messages array as assistant role content.

This means that every time a new question is asked, a running transcript of the conversation so far is sent along with the latest question. Since the model has no memory, you need to send an updated transcript with each new question or the model will lose context of the previous questions and answers.


Python


import os

import openai

openai.api_type = "azure"

openai.api_version = "2023-05-15"

openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI


resource's endpoint value .

openai.api_key = os.getenv("OPENAI_API_KEY")

conversation=[{"role": "system", "content": "You are a helpful assistant."}] while(True):

user_input = input()

conversation.append({"role": "user", "content": user_input})


response = openai.ChatCompletion.create(

engine="gpt-3.5-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model.

messages = conversation

)


conversation.append({"role": "assistant", "content": response['choices'] [0]['message']['content']})

print("\n" + response['choices'][0]['message']['content'] + "\n")


When you run the code above you will get a blank console window. Enter your first question in the window and then hit enter. Once the response is returned, you can repeat the process and keep asking questions.


Managing conversations

The previous example will run until you hit the model's token limit. With each question

messages

gpt-

gpt-4

gpt-4-32k

asked, and answer received, the array grows in size. The token limit for

35-turbo

is 4096 tokens, whereas the token limits for

and

are 8192 and

max_tokens

32768 respectively. These limits include the token count from both the message array sent and the model response. The number of tokens in the messages array combined

with the value of the an error.

parameter must stay under these limits or you'll receive

It's your responsibility to ensure the prompt and completion falls within the token limit. This means that for longer conversations, you need to keep track of the token count and only send the model a prompt that falls within the limit.

The following code sample shows a simple chat loop example with a technique for handling a 4096 token count using OpenAI's tiktoken library.

pip install tiktoken

The code requires tiktoken 0.3.0 . If you have an older version run


--upgrade .


Python


import tiktoken import openai

import os

openai.api_type = "azure"

openai.api_version = "2023-05-15"

openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI resource's endpoint value .

openai.api_key = os.getenv("OPENAI_API_KEY")


system_message = {"role": "system", "content": "You are a helpful assistant."}

max_response_tokens = 250

token_limit= 4096 conversation=[]

conversation.append(system_message)


def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"): encoding = tiktoken.encoding_for_model(model)

num_tokens = 0

for message in messages:

num_tokens += 4 # every message follows <im_start>

{role/name}\n{content}<im_end>\n

for key, value in message.items():

num_tokens += len(encoding.encode(value))

if key == "name": # if there's a name, the role is omitted num_tokens += -1 # role is always required and always 1

token

num_tokens += 2 # every reply is primed with <im_start>assistant return num_tokens


while(True):

user_input = input("")

conversation.append({"role": "user", "content": user_input}) conv_history_tokens = num_tokens_from_messages(conversation)


while (conv_history_tokens+max_response_tokens >= token_limit): del conversation[1]

conv_history_tokens = num_tokens_from_messages(conversation)


response = openai.ChatCompletion.create(

engine="gpt-35-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model.

messages = conversation, temperature=.7,

max_tokens=max_response_tokens,

)


conversation.append({"role": "assistant", "content": response['choices'] [0]['message']['content']})

print("\n" + response['choices'][0]['message']['content'] + "\n")

del

pop()

In this example once the token count is reached the oldest messages in the conversation

transcript will be removed.

is used instead of

for efficiency, and we start at

index 1 so as to always preserve the system message and only remove user/assistant messages. Over time, this method of managing the conversation can cause the conversation quality to degrade as the model will gradually lose context of the earlier portions of the conversation.

An alternative approach is to limit the conversation duration to the max token length or a certain number of turns. Once the max token limit is reached and the model would lose context if you were to allow the conversation to continue, you can prompt the user that they need to begin a new conversation and clear the messages array to start a brand new conversation with the full token limit available.

The token counting portion of the code demonstrated previously, is a simplified version of one of OpenAI's cookbook examples .


Next steps

Learn more about Azure OpenAI.

Get started with the ChatGPT model with the ChatGPT quickstart.

For more examples, check out the Azure OpenAI Samples GitHub repository

Learn how to generate or manipulate text

Article • 02/17/2023 • 16 minutes to read


The completions endpoint can be used for a wide variety of tasks. It provides a simple but powerful text-in, text-out interface to any of our models. You input some text as a prompt, and the model will generate a text completion that attempts to match whatever context or pattern you gave it. For example, if you give the API the prompt, "As Descartes said, I think, therefore", it will return the completion " I am" with high probability.

The best way to start exploring completions is through our playground in Azure OpenAI Studio . It's a simple text box where you can submit a prompt to generate a completion. You can start with a simple example like the following:


write a tagline for an ice cream shop


once you submit, you'll see something like the following generated:


Console


write a tagline for an ice cream shop we serve up smiles with every scoop!

The actual completion results you see may differ because the API is stochastic by default. In other words, you might get a slightly different completion every time you call it, even if your prompt stays the same. You can control this behavior with the temperature setting.

This simple, "text in, text out" interface means you can "program" the model by providing instructions or just a few examples of what you'd like it to do. Its success generally depends on the complexity of the task and quality of your prompt. A general rule is to think about how you would write a word problem for a middle school student to solve. A well-written prompt provides enough information for the model to know what you want and how it should respond.


Note

Keep in mind that the models' training data cuts off in October 2019, so they may not have knowledge of current events. We plan to add more continuous training in the future.


Prompt design

Basics

OpenAI's models can do everything from generating original stories to performing complex text analysis. Because they can do so many things, you have to be explicit in showing what you want. Showing, not just telling, is often the secret to a good prompt.

The models try to predict what you want from the prompt. If you send the words "Give me a list of cat breeds," the model wouldn't automatically assume that you're asking for a list of cat breeds. You could as easily be asking the model to continue a conversation where the first words are "Give me a list of cat breeds" and the next ones are "and I'll tell you which ones I like." If the model only assumed that you wanted a list of cats, it wouldn't be as good at content creation, classification, or other tasks.

There are three basic guidelines to creating prompts:


Show and tell. Make it clear what you want either through instructions, examples, or a combination of the two. If you want the model to rank a list of items in alphabetical order or to classify a paragraph by sentiment, show it that's what you want.

Provide quality data. If you're trying to build a classifier or get the model to follow a pattern, make sure that there are enough examples. Be sure to proofread your examples

— the model is usually smart enough to see through basic spelling mistakes and give you a response, but it also might assume that the mistakes are intentional and it can affect the response.

Check your settings. The temperature and top_p settings control how deterministic the model is in generating a response. If you're asking it for a response where there's only one right answer, then you'd want to set these settings to lower values. If you're looking for a response that's not obvious, then you might want to set them to higher values. The number one mistake people use with these settings is assuming that they're "cleverness" or "creativity" controls.


Troubleshooting

If you're having trouble getting the API to perform as expected, follow this checklist:


  1. Is it clear what the intended generation should be?

  2. Are there enough examples?

  3. Did you check your examples for mistakes? (The API won't tell you directly)

  4. Are you using temp and top_p correctly?


Classification

To create a text classifier with the API we provide a description of the task and provide a few examples. In this demonstration we show the API how to classify the sentiment of Tweets.


Console


This is a tweet sentiment classifier


Tweet: "I loved the new Batman movie!" Sentiment: Positive


Tweet: "I hate it when my phone battery dies." Sentiment: Negative


Tweet: "My day has been 👍" Sentiment: Positive


Tweet: "This is the link to the article" Sentiment: Neutral


Tweet: "This new music video blew my mind" Sentiment:

It's worth paying attention to several features in this example:


  1. Use plain language to describe your inputs and outputs We use plain language for the input "Tweet" and the expected output "Sentiment." For best practices, start with plain language descriptions. While you can often use shorthand or keys to indicate the input and output, when building your prompt it's best to start by being as descriptive as possible and then working backwards removing extra words as long as the performance to the prompt is consistent.

  2. Show the API how to respond to any case In this example we provide multiple outcomes "Positive", "Negative" and "Neutral." A neutral outcome is important because there will be many cases where even a human would have a hard time determining if something is positive or negative and situations where it's neither.

  3. You can use text and emoji The classifier is a mix of text and emoji 👍. The API reads emoji and can even convert expressions to and from them.

  4. You need fewer examples for familiar tasks For this classifier we only provided a handful of examples. This is because the API already has an understanding of sentiment

    and the concept of a tweet. If you're building a classifier for something the API might not be familiar with, it might be necessary to provide more examples.


    Improving the classifier's efficiency

    Now that we have a grasp of how to build a classifier, let's take that example and make it even more efficient so that we can use it to get multiple results back from one API call.




    This is a tweet sentiment classifier


    Tweet: "I loved the new Batman movie!" Sentiment: Positive


    Tweet: "I hate it when my phone battery dies" Sentiment: Negative


    Tweet: "My day has been 👍" Sentiment: Positive


    Tweet: "This is the link to the article" Sentiment: Neutral


    Tweet text


    Tweet sentiment ratings: 1: Positive

    2: Negative

    3: Positive

    4: Neutral

    5: Positive


    Tweet text


    Tweet sentiment ratings: 1.

    1. "I loved the new Batman movie!"

    2. "I hate it when my phone battery dies"

    3. "My day has been 👍"

    4. "This is the link to the article"

    5. "This new music video blew my mind"

    1. "I can't stand homework"

    2. "This sucks. I'm bored 😠"

    3. "I can't wait for Halloween!!!"

    4. "My cat is adorable ❤❤"

    5. "I hate chocolate"

    After showing the API how tweets are classified by sentiment we then provide it a list of tweets and then a list of sentiment ratings with the same number index. The API is able

    to pick up from the first example how a tweet is supposed to be classified. In the second example it sees how to apply this to a list of tweets. This allows the API to rate five (and even more) tweets in just one API call.

    It's important to note that when you ask the API to create lists or evaluate text you need to pay extra attention to your probability settings (Top P or Temperature) to avoid drift.

    1. Make sure your probability setting is calibrated correctly by running multiple tests.


    2. Don't make your list too long or the API is likely to drift.



Generation

One of the most powerful yet simplest tasks you can accomplish with the API is generating new ideas or versions of input. You can give the API a list of a few story ideas and it will try to add to that list. We've seen it create business plans, character descriptions and marketing slogans just by providing it a handful of examples. In this demonstration we'll use the API to create more examples for how to use virtual reality in the classroom:




Ideas involving education and virtual reality


  1. Virtual Mars

    Students get to explore Mars via virtual reality and go on missions to collect and catalog what they see.



2.

All we had to do in this example is provide the API with just a description of what the list

is about and one example. We then prompted the API with the number that it's a continuation of the list.

indicating


Although this is a very simple prompt, there are several details worth noting:


  1. We explained the intent of the list

    Just like with the classifier, we tell the API up front what the list is about. This helps it focus on completing the list and not trying to guess what the pattern is behind it.

  2. Our example sets the pattern for the rest of the list

    Because we provided a one-sentence description, the API is going to try to follow that pattern for the rest of the items it adds to the list. If we want a more verbose response, we need to set that up from the start.

  3. We prompt the API by adding an incomplete entry

2.

When the API sees and the prompt abruptly ends, the first thing it tries to do is

figure out what should come after it. Since we already had an example with number one and gave the list a title, the most obvious response is to continue adding items to the list.

Advanced generation techniques

You can improve the quality of the responses by making a longer more diverse list in your prompt. One way to do that is to start off with one example, let the API generate more and select the ones that you like best and add them to the list. A few more high-quality variations can dramatically improve the quality of the responses.


Conversation

The API is extremely adept at carrying on conversations with humans and even with itself. With just a few lines of instruction, we've seen the API perform as a customer service chatbot that intelligently answers questions without ever getting flustered or a wise-cracking conversation partner that makes jokes and puns. The key is to tell the API how it should behave and then provide a few examples.

Here's an example of the API playing the role of an AI answering questions:




The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.


Human: Hello, who are you?

AI: I am an AI created by OpenAI. How can I help you today? Human:

This is all it takes to create a chatbot capable of carrying on a conversation. But underneath its simplicity there are several things going on that are worth paying attention to:

  1. We tell the API the intent but we also tell it how to behave Just like the other prompts, we cue the API into what the example represents, but we also add another key detail: we give it explicit instructions on how to interact with the phrase "The assistant is helpful, creative, clever, and very friendly."

    Without that instruction the API might stray and mimic the human it's interacting with and become sarcastic or some other behavior we want to avoid.

  2. We give the API an identity At the start we have the API respond as an AI that was created by OpenAI. While the API has no intrinsic identity, this helps it respond in a way that's as close to the truth as possible. You can use identity in other ways to create other kinds of chatbots. If you tell the API to respond as a woman who works as a research scientist in biology, you'll get intelligent and thoughtful comments from the API similar to what you'd expect from someone with that background.

In this example we create a chatbot that is a bit sarcastic and reluctantly answers questions:




Marv is a chatbot that reluctantly answers questions.


###

User: How many pounds are in a kilogram?

Marv: This again? There are 2.2 pounds in a kilogram. Please make a note of this.

###

User: What does HTML stand for?

Marv: Was Google too busy? Hypertext Markup Language. The T is for try to ask better questions in the future.

###

User: When did the first airplane fly?

Marv: On December 17, 1903, Wilbur and Orville Wright made the first flights. I wish they'd come and take me away.

###

User: Who was the first man in space? Marv:

To create an amusing and somewhat helpful chatbot we provide a few examples of questions and answers showing the API how to reply. All it takes is just a few sarcastic responses and the API is able to pick up the pattern and provide an endless number of snarky responses.


Transformation

The API is a language model that is familiar with a variety of ways that words and characters can be used to express information. This ranges from natural language text to code and languages other than English. The API is also able to understand content on a level that allows it to summarize, convert and express it in different ways.


Translation

In this example we show the API how to convert from English to French:




English: I do not speak French.

French: Je ne parle pas français. English: See you later!

French: À tout à l'heure!

English: Where is a good restaurant? French: Où est un bon restaurant?

English: What rooms do you have available?

French: Quelles chambres avez-vous de disponible? English:


This example works because the API already has a grasp of French, so there's no need to try to teach it this language. Instead, we just need to provide enough examples that API understands that it's converting from one language to another.

If you want to translate from English to a language the API is unfamiliar with you'd need to provide it with more examples and a fine-tuned model to do it fluently.


Conversion

In this example we convert the name of a movie into emoji. This shows the adaptability of the API to picking up patterns and working with other characters.



Back to Future: 👨👴🚗🕒

Batman: 🤵🦇

Transformers: 🚗🤖

Wonder Woman:

Spider-Man: 🕸🕷🕸🕸🕷🕸

Winnie the Pooh: 🐻🐼🐻

The Godfather: 👨👩👧 👲💥

Game of Thrones: 🏹🗡🗡🏹

Spider-Man:


Summarization

The API is able to grasp the context of text and rephrase it in different ways. In this example, the API takes a block of text and creates an explanation a child would understand. This illustrates that the API has a deep grasp of language.




My ten-year-old asked me what this passage means: """


A neutron star is the collapsed core of a massive supergiant star, which had a total mass of between 10 and 25 solar masses, possibly more if the star

was especially metal-rich.[1] Neutron stars are the smallest and densest

stellar objects, excluding black holes and hypothetical white holes, quark stars, and strange stars.[2] Neutron stars have a radius on the order of 10 kilometres (6.2 mi) and a mass of about 1.4 solar masses.[3] They result

from the supernova explosion of a massive star, combined with gravitational collapse, that compresses the core past white dwarf star density to that of atomic nuclei.

"""


I rephrased it for him, in plain language a ten-year-old can understand: """


In this example we place whatever we want summarized between the triple quotes. It's worth noting that we explain both before and after the text to be summarized what our intent is and who the target audience is for the summary. This is to keep the API from drifting after it processes a large block of text.


Completion

While all prompts result in completions, it can be helpful to think of text completion as its own task in instances where you want the API to pick up where you left off. For example, if given this prompt, the API will continue the train of thought about vertical farming. You can lower the temperature setting to keep the API more focused on the intent of the prompt or increase it to let it go off on a tangent.




Vertical farming provides a novel solution for producing food locally, reducing transportation costs and

This next prompt shows how you can use completion to help write React components. We send some code to the API, and it's able to continue the rest because it has an understanding of the React library. We recommend using models from our Codex series for tasks that involve understanding or generating code. Currently, we support two

code-davinci-002

Codex models: and code-cushman-001 . For more information about

Codex models, see the Codex models section in Models.




import React from 'react';

const HeaderComponent = () => (

Factual responses

The API has a lot of knowledge that it's learned from the data it was trained on. It also has the ability to provide responses that sound very real but are in fact made up. There are two ways to limit the likelihood of the API making up an answer.

  1. Provide a ground truth for the API If you provide the API with a body of text to answer questions about (like a Wikipedia entry) it will be less likely to confabulate a response.

  2. Use a low probability and show the API how to say "I don't know" If the API understands that in cases where it's less certain about a response that saying "I don't know" or some variation is appropriate, it will be less inclined to make up answers.

In this example we give the API examples of questions and answers it knows and then examples of things it wouldn't know and provide question marks. We also set the probability to zero so the API is more likely to respond with a "?" if there's any doubt.




Q: Who is Batman?

A: Batman is a fictional comic book character.


Q: What is torsalplexity? A: ?


Q: What is Devz9?

A: ?


Q: Who is George Lucas?

A: George Lucas is American film director and producer famous for creating Star Wars.


Q: What is the capital of California? A: Sacramento.


Q: What orbits the Earth? A: The Moon.


Q: Who is Fred Rickerson? A: ?


Q: What is an atom?

A: An atom is a tiny particle that makes up everything.


Q: Who is Alvan Muntz? A: ?


Q: What is Kozar-09? A: ?



Q: How many moons does Mars have? A: Two, Phobos and Deimos.


Q:


Working with code

The Codex model series is a descendant of OpenAI's base GPT-3 series that's been trained on both natural language and billions of lines of code. It's most capable in Python and proficient in over a dozen languages including C#, JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and even Shell.

Learn more about generating code completions, with the working with code guide


Next steps

Learn how to work with code (Codex). Learn more about the underlying models that power Azure OpenAI.



Additional resources

How to switch between OpenAI and Azure OpenAI endpoints with Python

Article • 05/26/2023


While OpenAI and Azure OpenAI Service rely on a common Python client library are small changes you need to make to your code in order to swap back and forth

, there

between endpoints. This article walks you through the common changes and differences you'll experience when working across OpenAI and Azure OpenAI.


Note

This library is maintained by OpenAI and is currently in preview. Refer to the release

history

library.

or the version.py commit history

to track the latest updates to the


Authentication

We recommend using environment variables. If you haven't done this before our Python quickstarts walk you through this configuration.


API key


OpenAI

Azure OpenAI


Python


Python


import openai


import openai

openai.api_key =

openai.api_type = "azure"

"sk-..."

openai.api_key = "..."

openai.organization

openai.api_base = "https://example-

= "..."

endpoint.openai.azure.com"


openai.api_version = "2023-05-15" # subject to


change


Azure Active Directory authentication


OpenAI Azure OpenAI

Python Python


import openai import openai

from azure.identity import DefaultAzureCredential

openai.api_key =

"sk-..." credential = DefaultAzureCredential()

openai.organizat token =

ion = "..." credential.get_token("https://cognitiveservices.az

ure.com/.default")


openai.api_type = "azuread" openai.api_key = token.token

openai.api_base = "https://example-endpoint.openai.azure.com"

openai.api_version = "2023-05-15" # subject to change


Keyword argument for model


model

OpenAI uses the keyword argument to specify what model to use. Azure OpenAI

deployment_id

has the concept of deployments and uses the keyword argument to

engine

model

describe which model deployment to use. Azure OpenAI also supports the use of interchangeably with deployment_id .


engine

For OpenAI preferred.

still works in most instances, but it's deprecated and is


OpenAI

Azure OpenAI


Python


Python


completion =


completion =

openai.Completion.create(

openai.Completion.create(

prompt="<prompt>",

prompt="<prompt>",

model="text-davinci-003"

deployment_id="text-davinci-

)

003"


#engine="text-davinci-003"

chat_completion =

)

openai.ChatCompletion.create(


messages="<messages>",

chat_completion =

model="gpt-4"

openai.ChatCompletion.create(

)

messages="<messages>",


deployment_id="gpt-4"

embedding =

#engine="gpt-4"

openai.Embedding.create(


input="<input>",

)

model="text-embedding-ada-002"


) embedding =

openai.Embedding.create( input="<input>",

deployment_id="text-embedding-ada-002"

#engine="text-embedding-ada-002"

)


Azure OpenAI embeddings doesn't support multiple inputs

Many examples show passing multiple inputs into the embeddings API. For Azure OpenAI, currently we must pass a single text input per call.


OpenAI

Azure OpenAI


Python


Python


inputs = ["A", "B", "C"]


inputs = ["A", "B", "C"]

embedding =

for text in inputs:

openai.Embedding.create(

embedding =

input=inputs,

openai.Embedding.create(

model="text-embedding-ada-

input=text,

002"

deployment_id="text-

)

embedding-ada-002"


#engine="text-embedding-ada-


002"


)


Next steps

Learn more about how to work with ChatGPT and the GPT-4 models with our how-to guide.

For more examples, check out the Azure OpenAI Samples GitHub repository

Codex models and Azure OpenAI Service

Article • 02/17/2023 • 10 minutes to read


The Codex model series is a descendant of our GPT-3 series that's been trained on both natural language and billions of lines of code. It's most capable in Python and proficient in over a dozen languages including C#, JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and even Shell.

You can use Codex for a variety of tasks including:


Turn comments into code

Complete your next line or function in context

Bring knowledge to you, such as finding a useful library or API call for an application

Add comments

Rewrite code for efficiency


How to use the Codex models

Here are a few examples of using Codex that can be tested in Azure OpenAI Studio's playground with a deployment of a Codex series model, such as code-davinci-002 .


Saying "Hello" (Python)


Python


"""

Ask the user for their name and say "Hello" """


Create random names (Python)


Python


"""

  1. Create a list of first names

  2. Create a list of last names

  3. Combine them randomly into a list of 100 full names """

Create a MySQL query (Python)


Python


"""

Table customers, columns = [CustomerId, FirstName, LastName, Company,

Address, City, State, Country, PostalCode, Phone, Fax, Email, SupportRepId]

Create a MySQL query for all customers in Texas named Jane """

query =


Explaining code (JavaScript)


JavaScript


// Function 1

var fullNames = [];

for (var i = 0; i < 50; i++) {

fullNames.push(names[Math.floor(Math.random() * names.length)]

+ " " + lastNames[Math.floor(Math.random() * lastNames.length)]);

}


// What does Function 1 do?


Best practices

Start with a comment, data or code

You can experiment using one of the Codex models in our playground (styling instructions as comments when needed.)

To get Codex to create a useful completion, it's helpful to think about what information a programmer would need to perform a task. This could simply be a clear comment or the data needed to write a useful function, like the names of variables or what class a function handles.

In this example we tell Codex what to call the function and what task it's going to perform.


Python


# Create a function called 'nameImporter' to add a first and last name to the database

This approach scales even to the point where you can provide Codex with a comment and an example of a database schema to get it to write useful query requests for various databases. Here's an example where we provide the columns and table names for the query.


Python


# Table albums, columns = [AlbumId, Title, ArtistId] # Table artists, columns = [ArtistId, Name]

# Table media_types, columns = [MediaTypeId, Name] # Table playlists, columns = [PlaylistId, Name]

# Table playlist_track, columns = [PlaylistId, TrackId]

# Table tracks, columns = [TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, UnitPrice]


# Create a query for all albums with more than 10 tracks

When you show Codex the database schema, it's able to make an informed guess about how to format a query.


Specify the programming language

Codex understands dozens of different programming languages. Many share similar conventions for comments, functions and other programming syntax. By specifying the language and what version in a comment, Codex is better able to provide a completion for what you want. That said, Codex is fairly flexible with style and syntax. Here's an example for R and Python.


R


# R language

# Calculate the mean distance between an array of points


Python


# Python 3

# Calculate the mean distance between an array of points


Prompt Codex with what you want it to do

If you want Codex to create a webpage, placing the first line of code in an HTML document ( <!DOCTYPE html> ) after your comment tells Codex what it should do next.

The same method works for creating a function from a comment (following the comment with a new line starting with func or def).


HTML


<!-- Create a web page with the title 'Kat Katman attorney at paw' -->

<!DOCTYPE html>


<!DOCTYPE html>

Placing it to do.

after our comment makes it very clear to Codex what we want

Or if we want to write a function we could start the prompt as follows and Codex will understand what it needs to do next.


Python


# Create a function to count to 100


def counter


Specifying libraries will help Codex understand what you want

Codex is aware of a large number of libraries, APIs and modules. By telling Codex which ones to use, either from a comment or importing them into your code, Codex will make suggestions based upon them instead of alternatives.


HTML


<!-- Use A-Frame version 1.2.0 to create a 3D website -->

<!-- https://aframe.io/releases/1.2.0/aframe.min.js -->

By specifying the version, you can make sure Codex uses the most current library.


Note

Codex can suggest helpful libraries and APIs, but always be sure to do your own research to make sure that they're safe for your application.


Comment style can affect code quality

With some languages, the style of comments can improve the quality of the output. For example, when working with Python, in some cases using doc strings (comments wrapped in triple quotes) can give higher quality results than using the pound ( # ) symbol.


Python


"""

Create an array of users and email addresses """


Comments inside of functions can be helpful

Recommended coding standards usually suggest placing the description of a function inside the function. Using this format helps Codex more clearly understand what you want the function to do.


Python


def getUserBalance(id): """

Look up the user in the database ‘UserData' and return their current account balance.

"""


Provide examples for more precise results

If you have a particular style or format you need Codex to use, providing examples or demonstrating it in the first part of the request will help Codex more accurately match what you need.


Python


"""

Create a list of random animals and species """

animals = [ {"name": "Chomper", "species": "Hamster"}, {"name":


Lower temperatures give more precise results

Setting the API temperature to 0, or close to zero (such as 0.1 or 0.2) tends to give better results in most cases. Unlike GPT-3 models, where a higher temperature can

provide useful creative and random results, higher temperatures with Codex models may give you really random or erratic responses.

In cases where you need Codex to provide different potential results, start at zero and then increment upwards by 0.1 until you find suitable variation.


Organize tasks into functions

We can get Codex to write functions by specifying what the function should do in as precise terms as possible in comment. For example, by writing the following comment, Codex creates a JavaScript timer function that's triggered when a user presses a button:

A simple JavaScript timer


JavaScript


// Create a timer that creates an alert in 10 seconds


Creating example data

Testing applications often requires using example data. Because Codex is a language model that understands how to comprehend and write natural language, you can ask Codex to create data like arrays of made up names, products and other variables. For example, here we ask Codex to create an array of weather temperatures.


JavaScript


/* Create an array of weather temperatures for San Francisco */

Asking Codex to perform this task will produce a table like this:


JavaScript


var weather = [

{ month: 'January', high: 58, low: 48 },

{ month: 'February', high: 61, low: 50 },

{ month: 'March', high: 64, low: 53 },

{ month: 'April', high: 67, low: 55 },

{ month: 'May', high: 70, low: 58 },

{ month: 'June', high: 73, low: 61 },

{ month: 'July', high: 76, low: 63 },

{ month: 'August', high: 77, low: 64 },

{ month: 'September', high: 76, low: 63 },

{ month: 'October', high: 73, low: 61 },

{ month: 'November', high: 68, low: 57 },


{ month: 'December', high: 64, low: 54 }

];


Compound functions and small applications

We can provide Codex with a comment consisting of a complex request like creating a random name generator or performing tasks with user input and Codex can generate the rest provided there are enough tokens.


JavaScript


/*

Create a list of animals Create a list of cities

Use the lists to generate stories about what I saw at the zoo in each city

*/


Limit completion size for more precise results or lower latency

Requesting longer completions in Codex can lead to imprecise answers and repetition. Limit the size of the query by reducing max_tokens and setting stop tokens. For

\n

instance, add as a stop sequence to limit completions to one line of code. Smaller

completions also incur less latency.


Use streaming to reduce latency

Large Codex queries can take tens of seconds to complete. To build applications that require lower latency, such as coding assistants that perform autocompletion, consider using streaming. Responses will be returned before the model finishes generating the entire completion. Applications that need only part of a completion can reduce latency by cutting off a completion either programmatically or by using creative values for stop .

Users can combine streaming with duplication to reduce latency by requesting more than one solution from the API, and using the first response returned. Do this by setting n > 1 . This approach consumes more token quota, so use carefully (for example, by

max_tokens

using reasonable settings for and stop ).


Use Codex to explain code

Codex's ability to create and understand code allows us to use it to perform tasks like explaining what the code in a file does. One way to accomplish this is by putting a comment after a function that starts with "This function" or "This application is." Codex will usually interpret this as the start of an explanation and complete the rest of the text.


JavaScript


/* Explain what the previous function is doing: It


Explaining an SQL query

In this example, we use Codex to explain in a human readable format what an SQL query is doing.


SQL


SELECT DISTINCT department.name FROM department

JOIN employee ON department.id = employee.department_id

JOIN salary_payments ON employee.id = salary_payments.employee_id WHERE salary_payments.date BETWEEN '2020-06-01' AND '2020-06-30' GROUP BY department.name

HAVING COUNT(employee.id) > 10;

-- Explanation of the above query in human readable format

--


Writing unit tests

Creating a unit test can be accomplished in Python simply by adding the comment "Unit test" and starting a function.


Python


# Python 3

def sum_numbers(a, b): return a + b


# Unit test def


Checking code for errors

By using examples, you can show Codex how to identify errors in code. In some cases no examples are required, however demonstrating the level and detail to provide a

description can help Codex understand what to look for and how to explain it. (A check by Codex for errors shouldn't replace careful review by the user. )


JavaScript


/* Explain why the previous function doesn't work. */


Using source data to write database functions

Just as a human programmer would benefit from understanding the database structure and the column names, Codex can use this data to help you write accurate query requests. In this example, we insert the schema for a database and tell Codex what to query the database for.


Python


# Table albums, columns = [AlbumId, Title, ArtistId] # Table artists, columns = [ArtistId, Name]

# Table media_types, columns = [MediaTypeId, Name] # Table playlists, columns = [PlaylistId, Name]

# Table playlist_track, columns = [PlaylistId, TrackId]

# Table tracks, columns = [TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, UnitPrice]


# Create a query for all albums with more than 10 tracks


Converting between languages

You can get Codex to convert from one language to another by following a simple format where you list the language of the code you want to convert in a comment, followed by the code and then a comment with the language you want it translated into.


Python


#


Convert this from Python to R

#

Python version

[

Python code ]

#

End

#

R version

Rewriting code for a library or framework

If you want Codex to make a function more efficient, you can provide it with the code to rewrite followed by an instruction on what format to use.


JavaScript


// Rewrite this as a React component

var input = document.createElement('input'); input.setAttribute('type', 'text');

document.body.appendChild(input);

var button = document.createElement('button'); button.innerHTML = 'Say Hello';

document.body.appendChild(button); button.onclick = function() {

var name = input.value;

var hello = document.createElement('div'); hello.innerHTML = 'Hello ' + name;

document.body.appendChild(hello);

};


// React version:


Next steps

Learn more about the underlying models that power Azure OpenAI.




Additional resources


Documentation

How to customize a model with Azure OpenAI - Azure OpenAI

Learn how to create your own customized model with Azure OpenAI

How to generate text with Azure OpenAI - Azure OpenAI

Learn how to generate or manipulate text, including code with Azure OpenAI

How to prepare a dataset for custom model training - Azure OpenAI

Learn how to prepare your dataset for fine-tuning

Azure OpenAI models - Azure OpenAI

Learn about the different models that are available in Azure OpenAI.

Quickstart - Deploy a model and generate text using Azure OpenAI - Azure OpenAI

Walkthrough on how to get started with Azure OpenAI and make your first completions call.

Azure OpenAI embeddings tutorial - Azure OpenAI

Learn how to use the Azure OpenAI embeddings API for document search with the BillSum dataset

Azure OpenAI REST API reference - Azure OpenAI

Learn how to use the Azure OpenAI REST API. In this article, you'll learn about authorization options, how to structure a request and receive a response.

Azure OpenAI content filtering - Azure OpenAI

Learn about the content filtering capabilities of the OpenAI service in Azure Cognitive Services

Show 5 more

Learn how to generate embeddings with Azure OpenAI

Article • 04/20/2023


An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.


How to get embeddings

To obtain an embedding vector for a piece of text, we make a request to the embeddings endpoint as shown in the following code snippets:


console


curl

https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPL OYMENT_NAME/embeddings?api-version=2022-12-01\

-H 'Content-Type: application/json' \

-H 'api-key: YOUR_API_KEY' \

-d '{"input": "Sample Document goes here"}'

Console


Best Practices

Verify inputs don't exceed the maximum length

The maximum length of input text for our embedding models is 2048 tokens (equivalent to around 2-3 pages of text). You should verify that your inputs don't exceed this limit before making a request.


Choose the best model for your task

<search_model>-

doc

For the search models, you can obtain embeddings in two ways. The model is used for longer pieces of text (to be searched over) and the

<search_model>-query

model is used for shorter pieces of text, typically queries or class

labels in zero shot classification. You can read more about all of the Embeddings models in our Models guide.


Replace newlines with a single space

Unless you're embedding code, we suggest replacing newlines (\n) in your input with a single space, as we have observed inferior results when newlines are present.


Limitations & risks

Our embedding models may be unreliable or pose social risks in certain cases, and may cause harm in the absence of mitigations. Review our Responsible AI content for more information on how to approach their use responsibly.


Next steps

Learn more about using Azure OpenAI and embeddings to perform document search with our embeddings tutorial.

Learn more about the underlying models that power Azure OpenAI.

Learn how to prepare your dataset for fine-tuning

Article • 03/14/2023 • 11 minutes to read


The first step of customizing your model is to prepare a high quality dataset. To do this you'll need a set of training examples composed of single input prompts and the associated desired output ('completion'). This format is notably different than using models during inference in the following ways:

Only provide a single prompt vs a few examples.

You don't need to provide detailed instructions as part of the prompt.

Each prompt should end with a fixed separator to inform the model when the prompt ends and the completion begins. A simple separator, which generally works well is \n\n###\n\n . The separator shouldn't appear elsewhere in any prompt.

Each completion should start with a whitespace due to our tokenization, which tokenizes most words with a preceding whitespace.

Each completion should end with a fixed stop sequence to inform the model when the completion ends. A stop sequence could be \n , ### , or any other token that doesn't appear in any completion.

For inference, you should format your prompts in the same way as you did when creating the training dataset, including the same separator. Also specify the same stop sequence to properly truncate the completion.

The dataset cannot exceed 100 MB in total file size.


Best practices

Customization performs better with high-quality examples and the more you have, generally the better the model performs. We recommend that you provide at least a few hundred high-quality examples to achieve a model that performs better than using well-designed prompts with a base model. From there, performance tends to linearly increase with every doubling of the number of examples. Increasing the number of examples is usually the best and most reliable way of improving performance.

If you're fine-tuning on a pre-existing dataset rather than writing prompts from scratch, be sure to manually review your data for offensive or inaccurate content if possible, or review as many random samples of the dataset as possible if it's large.

Specific guidelines

Fine-tuning can solve various problems, and the optimal way to use it may depend on your specific use case. Below, we've listed the most common use cases for fine-tuning and corresponding guidelines.


Classification

Classifiers are the easiest models to get started with. For classification problems we suggest using ada, which generally tends to perform only very slightly worse than more capable models once fine-tuned, while being significantly faster. In classification problems, each prompt in the dataset should be classified into one of the predefined classes. For this type of problem, we recommend:

Use a separator at the end of the prompt, for example, \n\n###\n\n . Remember to also append this separator when you eventually make requests to your model.

Choose classes that map to a single token. At inference time, specify

max_tokens=1 since you only need the first token for classification.

Ensure that the prompt + completion doesn't exceed 2048 tokens, including the separator

Aim for at least 100 examples per class

To get class log probabilities, you can specify logprobs=5 (for five classes) when using your model

Ensure that the dataset used for fine-tuning is very similar in structure and type of task as what the model will be used for


Case study: Is the model making untrue statements?

Let's say you'd like to ensure that the text of the ads on your website mention the correct product and company. In other words, you want to ensure the model isn't making things up. You may want to fine-tune a classifier which filters out incorrect ads.

The dataset might look something like the following:


JSON


{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"}

{"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"}

In the example above, we used a structured input containing the name of the company,

\nSupported:

the product, and the associated ad. As a separator we used which clearly

separated the prompt from the completion. With a sufficient number of examples, the separator you choose doesn't make much of a difference (usually less than 0.4%) as long as it doesn't appear within the prompt or the completion.

For this use case we fine-tuned an ada model since it is faster and cheaper, and the performance is comparable to larger models because it's a classification task.

Now we can query our model by making a Completion request.


Console


curl

https://YOUR_RESOURCE_NAME.openaiazure.com/openai/deployments/YOUR_DEPLOYMEN T_NAME/completions?api-version=2022-12-01\ \

-H 'Content-Type: application/json' \

-H 'api-key: YOUR_API_KEY' \

-d '{

"prompt": "Company: Reliable accountants Ltd\nProduct: Personal Tax help\nAd:Best advice in town!\nSupported:",

"max_tokens": 1

}'


yes

Which will return either or no .


Case study: Sentiment analysis

Let's say you'd like to get a degree to which a particular tweet is positive or negative. The dataset might look something like the following:


Console


{"prompt":"Overjoyed with the new iPhone! ->", "completion":" positive"}

{"prompt":"@contoso_basketball disappoint for a third straight night. ->", "completion":" negative"}

Once the model is fine-tuned, you can get back the log probabilities for the first

logprobs=2

completion token by setting on the completion request. The higher the

Console

probability for positive class, the higher the relative sentiment. Now we can query our model by making a Completion request.


curl

https://YOUR_RESOURCE_NAME.openaiazure.com/openai/deployments/YOUR_DEPLOYMEN T_NAME/completions?api-version=2022-12-01\ \

-H 'Content-Type: application/json' \

-H 'api-key: YOUR_API_KEY' \

-d '{

"prompt": "Excited to share my latest blog post! ->", "max_tokens": 1,

"logprobs": 2

}'


Which will return:


JSON


{



"object": "text_completion",


"created": 1589498378,


"model": "YOUR_FINE_TUNED_MODEL_NAME",


"choices": [


{


"logprobs": {


"text_offset": [


19


],


"token_logprobs": [


-0.03597255


],


"tokens": [


" positive"


],


"top_logprobs": [


{


" negative": -4.9785037,


" positive": -0.03597255


}


]


},


"text": " positive",


"index": 0,


"finish_reason": "length"


}


]

}



Case study: Categorization for Email triage

Let's say you'd like to categorize incoming email into one of a large number of predefined categories. For classification into a large number of categories, we recommend you convert those categories into numbers, which will work well with up to approximately 500 categories. We've observed that adding a space before the number sometimes slightly helps the performance, due to tokenization. You may want to structure your training data as follows:


JSON


{

"prompt":"Subject: <email_subject>\nFrom:<customer_name>\nDate:

<date>\nContent:<email_body>\n\n###\n\n", "completion":"

<numerical_category>"

}

For example:


JSON


{

"prompt":"Subject: Update my address\nFrom:Joe

Doe\nTo:support@ourcompany.com\nDate:2021-06-03\nContent:Hi,\nI would like to update my billing address to match my delivery address.\n\nPlease let me know once done.\n\nThanks,\nJoe\n\n###\n\n",

"completion":" 4"

}

\n\n###\n\n

In the example above we used an incoming email capped at 2043 tokens as input. (This allows for a four token separator and a one token completion, summing up to 2048.) As

a separator we used email.

and we removed any occurrence of ### within the


Conditional generation

Conditional generation is a problem where the content needs to be generated given some kind of input. This includes paraphrasing, summarizing, entity extraction, product description writing given specifications, chatbots and many others. For this type of problem we recommend:

Use a separator at the end of the prompt, for example, \n\n###\n\n . Remember to also append this separator when you eventually make requests to your model.

Use an ending token at the end of the completion, for example, END .

Remember to add the ending token as a stop sequence during inference, for example, stop=[" END"] .

Aim for at least ~500 examples.

Ensure that the prompt + completion doesn't exceed 2048 tokens, including the separator.

Ensure the examples are of high quality and follow the same desired format. Ensure that the dataset used for fine-tuning is similar in structure and type of task as what the model will be used for.

Using Lower learning rate and only 1-2 epochs tends to work better for these use cases.


Case study: Write an engaging ad based on a Wikipedia article

This is a generative use case so you would want to ensure that the samples you provide are of the highest quality, as the fine-tuned model will try to imitate the style (and mistakes) of the given examples. A good starting point is around 500 examples. A sample dataset might look like this:


JSON


{

"prompt":"<Product Name>\n<Wikipedia description>\n\n###\n\n", "completion":" <engaging ad> END"

}

For example:


JSON


{

"prompt":"Samsung Galaxy Feel\nThe Samsung Galaxy Feel is an Android smartphone developed by Samsung Electronics exclusively for the Japanese

market. The phone was released in June 2017 and was sold by NTT Docomo. It runs on Android 7.0 (Nougat), has a 4.7 inch display, and a 3000 mAh

battery.\nSoftware\nSamsung Galaxy Feel runs on Android 7.0 (Nougat), but can be later updated to Android 8.0 (Oreo).\nHardware\nSamsung Galaxy Feel has a 4.7 inch Super AMOLED HD display, 16 MP back facing and 5 MP front

facing cameras. It has a 3000 mAh battery, a 1.6 GHz Octa-Core ARM Cortex-

A53 CPU, and an ARM Mali-T830 MP1 700 MHz GPU. It comes with 32GB of

internal storage, expandable to 256GB via microSD. Aside from its software and hardware specifications, Samsung also introduced a unique a hole in the phone's shell to accommodate the Japanese perceived penchant for

personalizing their mobile phones. The Galaxy Feel's battery was also touted as a major selling point since the market favors handsets with longer

battery life. The device is also waterproof and supports 1seg digital broadcasts using an antenna that is sold separately.\n\n###\n\n",

"completion":"Looking for a smartphone that can do it all? Look no

further than Samsung Galaxy Feel! With a slim and sleek design, our latest smartphone features high-quality picture and video capabilities, as well as


an award winning battery life. END"

}


Here we used a multiline separator, as Wikipedia articles contain multiple paragraphs and headings. We also used a simple end token, to ensure that the model knows when the completion should finish.


Case study: Entity extraction

This is similar to a language transformation task. To improve the performance, it's best to either sort different extracted entities alphabetically or in the same order as they appear in the original text. This helps the model to keep track of all the entities which need to be generated in order. The dataset could look as follows:


JSON


{

"prompt":"<any text, for example news article>\n\n###\n\n",

"completion":" <list of entities, separated by a newline> END"

}

For example:


JSON


{

"prompt":"Portugal will be removed from the UK's green travel list from Tuesday, amid rising coronavirus cases and concern over a \"Nepal mutation of the so-called Indian variant\". It will join the amber list, meaning

holidaymakers should not visit and returnees must isolate for 10 days...\n\n###\n\n",

"completion":" Portugal\nUK\nNepal mutation\nIndian variant END"

}

A multi-line separator works best, as the text will likely contain multiple lines. Ideally there will be a high diversity of the types of input prompts (news articles, Wikipedia pages, tweets, legal documents), which reflect the likely texts which will be encountered when extracting entities.


Case study: Customer support chatbot

A chatbot will normally contain relevant context about the conversation (order details), summary of the conversation so far, and most recent messages. For this use case the same past conversation can generate multiple rows in the dataset, each time with a

slightly different context, for every agent generation as a completion. This use case requires a few thousand examples, as it likely deals with different types of requests, and customer issues. To ensure the performance is of high quality, we recommend vetting the conversation samples to ensure the quality of agent messages. The summary can be generated with a separate text transformation fine tuned model. The dataset could look as follows:


JSON


{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural

language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer:

<message2>\nAgent:", "completion":" <response2>\n"}

{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural

language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer:

<message2>\nAgent: <response2>\nCustomer: <message3>\nAgent:", "completion":" <response3>\n"}

\n

Here we purposefully separated different types of input information, but maintained Customer Agent dialog in the same format between a prompt and a completion. All the

completions should only be by the agent, and we can use doing inference.

as a stop sequence when


Case study: Product description based on a technical list of properties

Here it's important to convert the input data into a natural language, which will likely lead to superior performance. For example, the following format:


JSON


{

"prompt":"Item=handbag, Color=army_green, price=$99, size=S->",

"completion":"This stylish small green handbag will add a unique touch to your look, without costing you a fortune."

}

Won't work as well as:


JSON


{

"prompt":"Item is a handbag. Colour is army green. Price is midrange.

Size is small.->",

"completion":"This stylish small green handbag will add a unique touch


to your look, without costing you a fortune."

}


For high performance, ensure that the completions were based on the description provided. If external content is often consulted, then adding such content in an automated way would improve the performance. If the description is based on images, it may help to use an algorithm to extract a textual description of the image. Since completions are only one sentence long, we can use . as the stop sequence during inference.


Open ended generation

For this type of problem we recommend: Leave the prompt empty.

No need for any separators.

You'll normally want a large number of examples, at least a few thousand. Ensure the examples cover the intended domain or the desired tone of voice.


Case study: Maintaining company voice

Many companies have a large amount of high quality content generated in a specific voice. Ideally all generations from our API should follow that voice for the different use cases. Here we can use the trick of leaving the prompt empty, and feeding in all the documents which are good examples of the company voice. A fine-tuned model can be used to solve many different use cases with similar prompts to the ones used for base models, but the outputs are going to follow the company voice much more closely than previously.


JSON


{"prompt":"", "completion":" <company voice textual content>"}

{"prompt":"", "completion":" <company voice textual content2>"}

A similar technique could be used for creating a virtual character with a particular personality, style of speech and topics the character talks about.

Generative tasks have a potential to leak training data when requesting completions from the model, so extra care needs to be taken that this is addressed appropriately. For example personal or sensitive company information should be replaced by generic information or not be included into fine-tuning in the first place.

Next steps

Fine tune your model with our How-to guide

Learn more about the underlying models that power Azure OpenAI Service

Learn how to customize a model for your application

Article • 04/05/2023 • 51 minutes to read


Azure OpenAI Service lets you tailor our models to your personal datasets using a process known as fine-tuning. This customization step will let you get more out of the service by providing:

Higher quality results than what you can get just from prompt design The ability to train on more examples than can fit into a prompt Lower-latency requests

A customized model improves on the few-shot learning approach by training the model's weights on your specific prompts and structure. The customized model lets you achieve better results on a wider number of tasks without needing to provide examples in your prompt. The result is less text sent and fewer tokens processed on every API call, saving cost and improving request latency.


Note

There is a breaking change in the create fine tunes command in the latest 12-01-2022 GA API. For the latest command syntax consult the reference documentation


Prerequisites

An Azure subscription - Create one for free


Access granted to Azure OpenAI in the desired Azure subscription


Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.

An Azure OpenAI resource


For more information about creating a resource, see Create a resource and deploy a model using Azure OpenAI.


Fine-tuning workflow

The fine-tuning workflow in Azure OpenAI Studio requires the following steps:


  1. Prepare your training and validation data

  2. Use the Create customized model wizard in Azure OpenAI Studio to train your customized model

    1. Select a base model

    2. Choose your training data

    3. Optionally, choose your validation data

    4. Optionally, choose advanced options for your fine-tune job

    5. Review your choices and train your new customized model

  3. Check the status of your customized model

  4. Deploy your customized model for use

  5. Use your customized model

  6. Optionally, analyze your customized model for performance and fit


Prepare your training and validation data

Your training data and validation data sets consist of input & output examples for how you would like the model to perform.

The training and validation data you use must be formatted as a JSON Lines (JSONL) document in which each line represents a single prompt-completion pair. The OpenAI command-line interface (CLI) includes a data preparation tool that validates, gives suggestions, and reformats your training data into a JSONL file ready for fine-tuning.

Here's an example of the training data format:


JSON


{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

In addition to the JSONL format, training and validation data files must be encoded in UTF-8 and include a byte-order mark (BOM), and the file must be less than 200 MB in size. For more information about formatting your training data, see Learn how to prepare your dataset for fine-tuning.


Creating your training and validation datasets

Designing your prompts and completions for fine-tuning is different from designing your prompts for use with any of our GPT-3 base models. Prompts for completion calls

often use either detailed instructions or few-shot learning techniques, and consist of multiple examples. For fine-tuning, we recommend that each training example consists of a single input prompt and its desired completion output. You don't need to give detailed instructions or multiple completion examples for the same prompt.

The more training examples you have, the better. We recommend having at least 200 training examples. In general, we've found that each doubling of the dataset size leads to a linear increase in model quality.

For more information about preparing training data for various tasks, see Learn how to prepare your dataset for fine-tuning.


OpenAI CLI data preparation tool

We recommend using OpenAI's command-line interface (CLI) to assist with many of the data preparation steps. OpenAI has developed a tool that validates, gives suggestions, and reformats your data into a JSONL file ready for fine-tuning.

To install the CLI, run the following Python command:


Console


pip install --upgrade openai

<LOCAL_FILE>

To analyze your training data with the data preparation tool, run the following Python

command, replacing file to be analyzed:

with the full path and file name of the training data


Console


openai tools fine_tunes.prepare_data -f <LOCAL_FILE>


This tool accepts files in the following data formats, if they contain a prompt and a completion column/key:

Comma-separated values (CSV) Tab-separated values (TSV) Microsoft Excel workbook (XLSX) JavaScript Object Notation (JSON) JSON Lines (JSONL)

The tool reformats your training data and saves output into a JSONL file ready for fine-tuning, after guiding you through the process of implementing suggested changes.

Use the Create customized model wizard

Azure OpenAI Studio provides the Create customized model wizard, so you can interactively create and train a fine-tuned model for your Azure resource.


Go to the Azure OpenAI Studio

Navigate to the Azure OpenAI Studio at https://oai.azure.com/ and sign in with credentials that have access to your Azure OpenAI resource. During the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.


Landing page

You'll first land on our main page for Azure OpenAI Studio. From here, you can start fine-tuning a custom model.

Select the Start fine-tuning a custom model button under Manage deployments and models section of the landing page, highlighted in the following picture, to start fine-tuning a custom model.


Note

If your resource doesn't have a model already deployed in it, a warning is displayed. You can ignore that warning for the purposes of fine-tuning a model, because you'll be fine-tuning and deploying a new customized model.


Start the wizard from the Models page

To create a customized model, select the Create customized model button under the Provided models section on the Models page, highlighted in the following picture, to start the Create customized model wizard.



Select a base model

The first step in creating a customized model is to choose a base model. The Base

model pane lets you choose a base model to use for your customized model, and the choice influences both the performance and the cost of your model. You can create a customized model from one of the following available base models:


ada


babbage


curie


code-cushman-001 *


davinci *


* currently unavailable for new customers.


For more information about our base models that can be fine-tuned, see Models. Select a base model from the Base model type dropdown, as shown in the following picture, and then select Next to continue.



Choose your training data

The next step is to either choose existing prepared training data or upload new prepared training data to use when customizing your model. The Training data pane, shown in the following picture, displays any existing, previously uploaded datasets and provides options by which you can upload new training data.



If your training data has already been uploaded to the service, select Choose dataset, and then select the file from the list shown in the Training data pane. Otherwise, select either Local file to upload training data from a local file, or Azure blob or other shared web locations to import training data from Azure Blob or another shared web location.

For large data files, we recommend you import from an Azure Blob store. Large files can become unstable when uploaded through multipart forms because the requests are atomic and can't be retried or resumed. For more information about Azure Blob storage, see What is Azure Blob storage?


Note

Training data files must be formatted as JSONL files, encoded in UTF-8 with a byte-order mark (BOM), and less than 200 MB in size.


To upload training data from a local file

You can upload a new training dataset to the service from a local file by using one of the following methods:

Drag and drop the file into the client area of the Training data pane, and then select Upload file

Select Browse for a file from the client area of the Training data pane, choose the file to upload from the Open dialog, and then select Upload file.

After you've selected and uploaded the training dataset, select Next to optionally choose your validation data.



To import training data from an Azure Blob store

You can import a training dataset from Azure Blob or another shared web location by providing the name and location of the file, as shown in the following picture. Enter the name of the file in File name and the Azure Blob URL, Azure Storage shared access signature (SAS), or other link to an accessible shared web location that contains the file in File location, then select Upload file to import the training dataset to the service.

After you've selected and uploaded the training dataset, select Next to optionally choose your validation data.


Choose your validation data

You can now choose to optionally use validation data in the training process of your fine-tuned model. If you don't want to use validation data, you can choose Next to choose advanced options for your model. Otherwise, if you have a validation dataset, you can either choose existing prepared validation data or upload new prepared validation data to use when customizing your model. The Validation data pane, shown in the following picture, displays any existing, previously uploaded training and validation datasets and provides options by which you can upload new validation data.


If your validation data has already been uploaded to the service, select Choose dataset, and then select the file from the list shown in the Validation data pane. Otherwise, select either Local file to upload validation data from a local file, or Azure blob or other shared web locations to import validation data from Azure Blob or another shared web location.

For large data files, we recommend you import from an Azure Blob store. Large files can become unstable when uploaded through multipart forms because the requests are atomic and can't be retried or resumed.


Note

Like training data files, validation data files must be formatted as JSONL files, encoded in UTF-8 with a byte-order mark (BOM), and less than 200 MB in size.


To upload validation data from a local file

You can upload a new validation dataset to the service from a local file by using one of the following methods:

Drag and drop the file into the client area of the Validation data pane, and then select Upload file

Select Browse for a file from the client area of the Validation data pane, choose the file to upload from the Open dialog, and then select Upload file.

After you've uploaded the validation dataset, select Next to optionally choose advanced options.



To import validation data from an Azure Blob store

You can import a validation dataset from Azure Blob or another shared web location by providing the name and location of the file, as shown in the following picture. Enter the name of the file in File name and the Azure Blob URL, Azure Storage shared access signature (SAS), or other link to an accessible shared web location that contains the file in File location, then select Upload file to import the validation dataset to the service.

After you've imported the validation dataset, select Next to optionally choose advanced options.


Choose advanced options

You can either use default values for the hyperparameters of the fine-tune job that the wizard runs to train your fine-tuned model, or you can adjust those hyperparameters for your customization needs in the Advanced options pane, shown in the following picture.


Either select Default to use the default values for the fine-tune job, or select Advanced

to display and edit the hyperparameter values, as shown in the following picture.


The following hyperparameters are available:


Parameter name

Description

Number of epochs

The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

Batch size

The batch size to use for training. The batch size is the number of training examples used to train a single forward and backward pass.

Learning rate

multiplier

The learning rate multiplier to use for training. The fine-tuning learning rate is the original learning rate used for pre-training, multiplied by this value.

Prompt loss weight

The weight to use for loss on the prompt tokens. This value controls how much the model tries to learn to generate the prompt (as compared to the completion, which always has a weight of 1.0.) Increasing this value can add a stabilizing effect to training when completions are short.

For more information about these hyperparameters, see the Create a Fine tune job section of the REST API documentation.

After you've chosen either default or advanced options, select Next to review your choices and train your fine-tuned model.


Review your choices and train your model

The Review and train pane of the wizard displays information about the choices you've made in the Create customized model wizard for your fine-tuned model, as shown in the following picture.


If you're ready to train your model, select Save and close to start the fine-tune job and return to the Models page.


Check the status of your customized model

The Models page displays information about your customized model in the Customized models tab, as shown in the following picture. The tab includes information about the status and job ID of the fine-tune job for your customized model. When the job is completed, the file ID of the result file is also displayed.


After you've started a fine-tune job, it may take some time to complete. Your job may be queued behind other jobs on our system, and training your model can take minutes or hours depending on the model and dataset size. You can check the status of the fine-tune job for your customized model in the Status column of the Customized models tab on the Models page, and you can select Refresh to update the information on that page.

You can also select the name of the model from the Model name column of the Models page to display more information about your customized model, including the status of the fine-tune job, training results, training events, and hyperparameters used in the job. You can select the Refresh button to refresh the information for your model, as shown in the following picture.



From the model page, you can also select Download training file to download the training data you used for the model, or select Download results to download the result file attached to the fine-tune job for your model and analyze your customized model for training and validation performance.


Deploy a customized model

When the fine-tune job has succeeded, you can deploy the customized model from the Models pane. You must deploy your customized model to make it available for use with completion calls.

Important

After a customized model is deployed, if at any time the deployment remains inactive for greater than fifteen (15) days, the deployment will automatically be deleted. The deployment of a customized model is “inactive” if the model was deployed more than fifteen (15) days ago and no completions or chat completions calls were made to it during a continuous 15-day period. The deletion of an inactive deployment does NOT delete or affect the underlying customized model, and the customized model can be redeployed at any time. As described in Azure OpenAI

Service pricing , each customized (fine-tuned) model that is deployed incurs an hourly hosting cost regardless of whether completions or chat completions calls are being made to the model. To learn more about planning and managing costs with Azure OpenAI, refer to our cost management guide.


Note

Only one deployment is permitted for a customized model. An error message is displayed if you select an already-deployed customized model.


To deploy your customized model, select the customized model to be deployed and then select Deploy model, as shown in the following picture.



The Deploy model dialog is presented, in which you can provide a name for the deployment of your customized model. Enter a name in Deployment name and then select Create to start the deployment of your customized model.


You can monitor the progress of your deployment from the Deployments pane of Azure OpenAI Studio.


Use a deployed customized model

Once your customized model has been deployed, you can use it like any other deployed model. For example, you can use the Playground pane of Azure OpenAI Studio to experiment with your new deployment, as shown in the following picture. You can continue to use the same parameters with your customized model, such as temperature and frequency penalty, as you can with other deployed models.


Note

As with all applications, we require a review process prior to going live.


Analyze your customized model

Azure OpenAI attaches a result file, named results.csv , to each fine-tune job once it's completed. You can use the result file to analyze the training and validation performance of your customized model. The file ID for the result file is listed for each customized model in the Result file Id column of the Models pane for Azure OpenAI Studio. You can use the file ID to identify and download the result file from the File Management pane of Azure OpenAI Studio.

The result file is a CSV file containing a header row and a row for each training step performed by the fine-tune job. The result file contains the following columns:


Column name

Description


step

The number of the training step. A training step represents a single pass, forward and backward, on a batch of training data.


elapsed_tokens

The number of tokens the customized model has seen so far, including repeats.

Column name

Description


elapsed_examples

The number of examples the model has seen so far, including repeats.

Each example represents one element in that step's batch of training data. For example, if the Batch size parameter is set to 32 in the Advanced options pane, this value increments by 32 in each training step.


training_loss

The loss for the training batch.


training_sequence_accuracy

The percentage of completions in the training batch for which the model's predicted tokens exactly matched the true completion tokens.

For example, if the batch size is set to 3 and your data contains completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.67

(2 of 3) if the model predicted [[1, 1], [0, 5], [4, 2]] .


training_token_accuracy

The percentage of tokens in the training batch that were correctly predicted by the model.

For example, if the batch size is set to 3 and your data contains completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.83

(5 of 6) if the model predicted [[1, 1], [0, 5], [4, 2]] .


validation_loss

The loss for the validation batch.

validation_sequence_accuracy The percentage of completions in the validation batch for which

the model's predicted tokens exactly matched the true completion tokens.

For example, if the batch size is set to 3 and your data contains completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.67

(2 of 3) if the model predicted [[1, 1], [0, 5], [4, 2]] .


validation_token_accuracy

The percentage of tokens in the validation batch that were correctly predicted by the model.

For example, if the batch size is set to 3 and your data contains completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.83

(5 of 6) if the model predicted [[1, 1], [0, 5], [4, 2]] .


Clean up your deployments, customized models, and training files

When you're done with your customized model, you can delete the deployment and model. You can also delete the training and validation files you uploaded to the service, if needed.

Delete your model deployment


Important

After a customized model is deployed, if at any time the deployment remains inactive for greater than fifteen (15) days, the deployment will automatically be deleted. The deployment of a customized model is “inactive” if the model was deployed more than fifteen (15) days ago and no completions or chat completions calls were made to it during a continuous 15-day period. The deletion of an inactive deployment does NOT delete or affect the underlying customized model, and the customized model can be redeployed at any time. As described in Azure OpenAI

Service pricing , each customized (fine-tuned) model that is deployed incurs an hourly hosting cost regardless of whether completions or chat completions calls are being made to the model. To learn more about planning and managing costs with Azure OpenAI, refer to our cost management guide.


You can delete the deployment for your customized model from the Deployments page for Azure OpenAI Studio. Select the deployment to delete, and then select Delete to delete the deployment.


Delete your customized model

You can delete a customized model from the Models page for Azure OpenAI Studio. Select the customized model to delete from the Customized models tab, and then select Delete to delete the customized model.


Note

You cannot delete a customized model if it has an existing deployment. You must first delete your model deployment before you can delete your customized model.


Delete your training files

You can optionally delete training and validation files you've uploaded for training, and result files generated during training, from the File Management page for Azure OpenAI Studio. Select the file to delete, and then select Delete to delete the file.


Next steps

Explore the full REST API Reference documentation to learn more about all the fine-tuning capabilities. You can find the full REST documentation here.

Explore more of the Python SDK operations here .

How to Configure Azure OpenAI Service with Managed Identities

Article • 05/10/2023


More complex security scenarios require Azure role-based access control (Azure RBAC). This document covers how to authenticate to your OpenAI resource using Azure Active Directory (Azure AD).

In the following sections, you'll use the Azure CLI to assign roles, and obtain a bearer token to call the OpenAI resource. If you get stuck, links are provided in each section with all available options for each command in Azure Cloud Shell/Azure CLI.


Prerequisites

An Azure subscription - Create one for free


Access granted to the Azure OpenAI service in the desired Azure subscription


Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.

Azure CLI - Installation Guide


The following Python libraries: os, requests, json


Sign into the Azure CLI

To sign-in to the Azure CLI, run the following command and complete the sign-in. You may need to do it again if your session has been idle for too long.


Azure CLI


az login


Assign yourself to the Cognitive Services User role

Assigning yourself to the Cognitive Services User role will allow you to use your account for access to the specific cognitive services resource

  1. Get your user information


    Azure CLI


    export user=$(az account show -o json | jq -r .user.name)

  2. Assign yourself to “Cognitive Services User” role.


    Azure CLI


    export resourceId=$(az group show -g $myResourceGroupName -o json | jq

    -r .id)

    az role assignment create --role "Cognitive Services User" --assignee

    $user --scope $resourceId


    Note

    Role assignment change will take ~5 mins to become effective.

  3. Acquire an Azure AD access token. Access tokens expire in one hour. you'll then need to acquire another one.


    Azure CLI


    export accessToken=$(az account get-access-token --resource

    https://cognitiveservices.azure.com -o json | jq -r .accessToken)

    Authorization

  4. Make an API call Use the access token to authorize your API call by setting the header value.


Bash


curl ${endpoint%/}/openai/deployments/YOUR_DEPLOYMENT_NAME/completions? api-version=2022-12-01 \

-H "Content-Type: application/json" \

-H "Authorization: Bearer $accessToken" \

-d '{ "prompt": "Once upon a time" }'


Authorize access to managed identities

OpenAI supports Azure Active Directory (Azure AD) authentication with managed identities for Azure resources. Managed identities for Azure resources can authorize access to Cognitive Services resources using Azure AD credentials from applications running in Azure virtual machines (VMs), function apps, virtual machine scale sets, and other services. By using managed identities for Azure resources together with Azure AD authentication, you can avoid storing credentials with your applications that run in the cloud.


Enable managed identities on a VM

Before you can use managed identities for Azure resources to authorize access to Cognitive Services resources from your VM, you must enable managed identities for Azure resources on the VM. To learn how to enable managed identities for Azure Resources, see:

Azure portal Azure PowerShell Azure CLI

Azure Resource Manager template Azure Resource Manager client libraries

For more information about managed identities, see Managed identities for Azure resources.

Configure Azure Cognitive Services virtual networks

Article • 03/08/2023


Azure Cognitive Services provides a layered security model. This model enables you to secure your Cognitive Services accounts to a specific subset of networks. When network rules are configured, only applications requesting data over the specified set of networks can access the account. You can limit access to your resources with request filtering. Allowing only requests originating from specified IP addresses, IP ranges or from a list of subnets in Azure Virtual Networks.

An application that accesses a Cognitive Services resource when network rules are in effect requires authorization. Authorization is supported with Azure Active Directory (Azure AD) credentials or with a valid API key.


Important

Turning on firewall rules for your Cognitive Services account blocks incoming requests for data by default. In order to allow requests through, one of the following conditions needs to be met:

The request should originate from a service operating within an Azure Virtual Network (VNet) on the allowed subnet list of the target Cognitive Services account. The endpoint in requests originated from VNet needs to be set as the custom subdomain of your Cognitive Services account.

Or the request should originate from an allowed list of IP addresses.


Requests that are blocked include those from other Azure services, from the Azure portal, from logging and metrics services, and so on.


Note

We recommend that you use the Azure Az PowerShell module to interact with Azure. See Install Azure PowerShell to get started. To learn how to migrate to the Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.


Scenarios

To secure your Cognitive Services resource, you should first configure a rule to deny access to traffic from all networks (including internet traffic) by default. Then, you should configure rules that grant access to traffic from specific VNets. This configuration enables you to build a secure network boundary for your applications. You can also configure rules to grant access to traffic from select public internet IP address ranges, enabling connections from specific internet or on-premises clients.

Network rules are enforced on all network protocols to Azure Cognitive Services, including REST and WebSocket. To access data using tools such as the Azure test consoles, explicit network rules must be configured. You can apply network rules to existing Cognitive Services resources, or when you create new Cognitive Services resources. Once network rules are applied, they're enforced for all requests.


Supported regions and service offerings

Virtual networks (VNETs) are supported in regions where Cognitive Services are

available . Cognitive Services supports service tags for network rules configuration. The services listed below are included in the CognitiveServicesManagement service tag.


Anomaly Detector

Azure OpenAI

Computer Vision

Content Moderator

Custom Vision

Face

Language Understanding (LUIS)

Personalizer

Speech service

Language service

QnA Maker

Translator Text


Note

If you're using, Azure OpenAI, LUIS, Speech Services, or Language services, the

CognitiveServicesManagement tag only enables you use the service using the SDK or REST API. To access and use Azure OpenAI Studio, LUIS portal , Speech Studio or Language Studio from a virtual network, you will need to use the following tags:


AzureActiveDirectory AzureFrontDoor.Frontend

AzureResourceManager

CognitiveServicesManagement CognitiveServicesFrontEnd


Change the default network access rule

By default, Cognitive Services resources accept connections from clients on any network. To limit access to selected networks, you must first change the default action.


Warning

Making changes to network rules can impact your applications' ability to connect to Azure Cognitive Services. Setting the default network rule to deny blocks all access to the data unless specific network rules that grant access are also applied. Be sure to grant access to any allowed networks using network rules before you change the default rule to deny access. If you are allow listing IP addresses for your on-premises network, be sure to add all possible outgoing public IP addresses from your on-premises network.


Managing default network access rules

You can manage default network access rules for Cognitive Services resources through the Azure portal, PowerShell, or the Azure CLI.



  1. Go to the Cognitive Services resource you want to secure.


  2. Select the RESOURCE MANAGEMENT menu called Virtual network.

Azure portal

  1. To deny access by default, choose to allow access from Selected networks. With the Selected networks setting alone, unaccompanied by configured

    Virtual networks or Address ranges - all access is effectively denied. When all access is denied, requests attempting to consume the Cognitive Services resource aren't permitted. The Azure portal, Azure PowerShell or, Azure CLI can still be used to configure the Cognitive Services resource.

  2. To allow traffic from all networks, choose to allow access from All networks.



  3. Select Save to apply your changes.


Grant access from a virtual network

You can configure Cognitive Services resources to allow access only from specific subnets. The allowed subnets may belong to a VNet in the same subscription, or in a different subscription, including subscriptions belonging to a different Azure Active Directory tenant.

Enable a service endpoint for Azure Cognitive Services within the VNet. The service endpoint routes traffic from the VNet through an optimal path to the Azure Cognitive Services service. The identities of the subnet and the virtual network are also transmitted with each request. Administrators can then configure network rules for the Cognitive Services resource that allow requests to be received from specific subnets in a VNet.

Clients granted access via these network rules must continue to meet the authorization requirements of the Cognitive Services resource to access the data.

Each Cognitive Services resource supports up to 100 virtual network rules, which may be combined with IP network rules.


Required permissions

To apply a virtual network rule to a Cognitive Services resource, the user must have the appropriate permissions for the subnets being added. The required permission is the default Contributor role, or the Cognitive Services Contributor role. Required permissions can also be added to custom role definitions.

Cognitive Services resource and the virtual networks granted access may be in different subscriptions, including subscriptions that are a part of a different Azure AD tenant.


Note

Configuration of rules that grant access to subnets in virtual networks that are a part of a different Azure Active Directory tenant are currently only supported through PowerShell, CLI and REST APIs. Such rules cannot be configured through the Azure portal, though they may be viewed in the portal.


Managing virtual network rules

You can manage virtual network rules for Cognitive Services resources through the Azure portal, PowerShell, or the Azure CLI.



  1. Go to the Cognitive Services resource you want to secure.


  2. Select the RESOURCE MANAGEMENT menu called Virtual network.


  3. Check that you've selected to allow access from Selected networks.


  4. To grant access to a virtual network with an existing network rule, under

Virtual networks, select Add existing virtual network.

Azure portal


5. Select the Virtual networks and Subnets options, and then select Enable.



  1. To create a new virtual network and grant it access, select Add new virtual network.



  2. Provide the information necessary to create the new virtual network, and then select Create.


Note

If a service endpoint for Azure Cognitive Services wasn't previously configured for the selected virtual network and subnets, you can configure it as part of this operation.

Presently, only virtual networks belonging to the same Azure Active Directory tenant are shown for selection during rule creation. To grant access to a subnet in a virtual network belonging to another tenant, please use PowerShell, CLI or REST APIs.

  1. To remove a virtual network or subnet rule, select ... to open the context menu for the virtual network or subnet, and select Remove.



  2. Select Save to apply your changes.


Important

Be sure to set the default rule to deny, or network rules have no effect.


Grant access from an internet IP range

You can configure Cognitive Services resources to allow access from specific public internet IP address ranges. This configuration grants access to specific services and on-premises networks, effectively blocking general internet traffic.

16.17.18.0/24

Provide allowed internet address ranges using CIDR notation in the form or as individual IP addresses like 16.17.18.19 .


Tip

Small address ranges using "/31" or "/32" prefix sizes are not supported. These ranges should be configured using individual IP address rules.

IP network rules are only allowed for public internet IP addresses. IP address ranges reserved for private networks (as defined in RFC 1918 ) aren't allowed in IP rules.

172.16.*

Private networks include addresses that start with 10.* , - 172.31.* , and


192.168.* .


Only IPV4 addresses are supported at this time. Each Cognitive Services resource supports up to 100 IP network rules, which may be combined with Virtual network rules.


Configuring access from on-premises networks

To grant access from your on-premises networks to your Cognitive Services resource with an IP network rule, you must identify the internet facing IP addresses used by your network. Contact your network administrator for help.

If you're using ExpressRoute on-premises for public peering or Microsoft peering, you'll need to identify the NAT IP addresses. For public peering, each ExpressRoute circuit by default uses two NAT IP addresses. Each is applied to Azure service traffic when the traffic enters the Microsoft Azure network backbone. For Microsoft peering, the NAT IP addresses that are used are either customer provided or are provided by the service provider. To allow access to your service resources, you must allow these public IP addresses in the resource IP firewall setting. To find your public peering ExpressRoute circuit IP addresses, open a support ticket with ExpressRoute via the Azure portal.

Learn more about NAT for ExpressRoute public and Microsoft peering.


Managing IP network rules

You can manage IP network rules for Cognitive Services resources through the Azure portal, PowerShell, or the Azure CLI.



  1. Go to the Cognitive Services resource you want to secure.


  2. Select the RESOURCE MANAGEMENT menu called Virtual network.


  3. Check that you've selected to allow access from Selected networks.


  4. To grant access to an internet IP range, enter the IP address or address range (in CIDR format ) under Firewall > Address Range. Only valid public IP (non-reserved) addresses are accepted.

Azure portal

  1. To remove an IP network rule, select the trash can icon next to the address range.



  2. Select Save to apply your changes.


Important

Be sure to set the default rule to deny, or network rules have no effect.

Use private endpoints

You can use private endpoints for your Cognitive Services resources to allow clients on a virtual network (VNet) to securely access data over a Private Link. The private endpoint uses an IP address from the VNet address space for your Cognitive Services resource.

Network traffic between the clients on the VNet and the resource traverses the VNet and a private link on the Microsoft backbone network, eliminating exposure from the public internet.

Private endpoints for Cognitive Services resources let you:


Secure your Cognitive Services resource by configuring the firewall to block all connections on the public endpoint for the Cognitive Services service.

Increase security for the VNet, by enabling you to block exfiltration of data from the VNet.

Securely connect to Cognitive Services resources from on-premises networks that connect to the VNet using VPN or ExpressRoutes with private-peering.


Conceptual overview

A private endpoint is a special network interface for an Azure resource in your VNet. Creating a private endpoint for your Cognitive Services resource provides secure connectivity between clients in your VNet and your resource. The private endpoint is assigned an IP address from the IP address range of your VNet. The connection between the private endpoint and the Cognitive Services service uses a secure private link.

Applications in the VNet can connect to the service over the private endpoint seamlessly, using the same connection strings and authorization mechanisms that they would use otherwise. The exception is the Speech Services, which require a separate endpoint. See the section on Private endpoints with the Speech Services. Private endpoints can be used with all protocols supported by the Cognitive Services resource, including REST.

Private endpoints can be created in subnets that use Service Endpoints. Clients in a subnet can connect to one Cognitive Services resource using private endpoint, while using service endpoints to access others.

When you create a private endpoint for a Cognitive Services resource in your VNet, a consent request is sent for approval to the Cognitive Services resource owner. If the user requesting the creation of the private endpoint is also an owner of the resource, this consent request is automatically approved.

Cognitive Services resource owners can manage consent requests and the private endpoints, through the 'Private endpoints' tab for the Cognitive Services resource in the Azure portal .


Private endpoints

When creating the private endpoint, you must specify the Cognitive Services resource it connects to. For more information on creating a private endpoint, see:

Create a private endpoint using the Private Link Center in the Azure portal Create a private endpoint using Azure CLI

Create a private endpoint using Azure PowerShell


Connecting to private endpoints


Note

Azure OpenAI Service uses a different private DNS zone and public DNS zone forwarder than other Azure Cognitive Services. Refer to the Azure services DNS zone configuration article for the correct zone and forwader names.


Clients on a VNet using the private endpoint should use the same connection string for the Cognitive Services resource as clients connecting to the public endpoint. The exception is the Speech Services, which require a separate endpoint. See the section on Private endpoints with the Speech Services. We rely upon DNS resolution to automatically route the connections from the VNet to the Cognitive Services resource over a private link.

We create a private DNS zone attached to the VNet with the necessary updates for the private endpoints, by default. However, if you're using your own DNS server, you may need to make additional changes to your DNS configuration. The section on DNS changes below describes the updates required for private endpoints.


Private endpoints with the Speech Services

See Using Speech Services with private endpoints provided by Azure Private Link.


DNS changes for private endpoints

When you create a private endpoint, the DNS CNAME resource record for the Cognitive Services resource is updated to an alias in a subdomain with the prefix 'privatelink'. By default, we also create a private DNS zone, corresponding to the 'privatelink' subdomain, with the DNS A resource records for the private endpoints.

When you resolve the endpoint URL from outside the VNet with the private endpoint, it resolves to the public endpoint of the Cognitive Services resource. When resolved from the VNet hosting the private endpoint, the endpoint URL resolves to the private endpoint's IP address.

This approach enables access to the Cognitive Services resource using the same connection string for clients in the VNet hosting the private endpoints and clients outside the VNet.

If you are using a custom DNS server on your network, clients must be able to resolve the fully qualified domain name (FQDN) for the Cognitive Services resource endpoint to the private endpoint IP address. Configure your DNS server to delegate your private link subdomain to the private DNS zone for the VNet.


Tip

When using a custom or on-premises DNS server, you should configure your DNS server to resolve the Cognitive Services resource name in the 'privatelink' subdomain to the private endpoint IP address. You can do this by delegating the 'privatelink' subdomain to the private DNS zone of the VNet, or configuring the DNS zone on your DNS server and adding the DNS A records.

For more information on configuring your own DNS server to support private endpoints, refer to the following articles:

Name resolution for resources in Azure virtual networks DNS configuration for private endpoints


Pricing

For pricing details, see Azure Private Link pricing .


Next steps

Explore the various Azure Cognitive Services

Learn more about Azure Virtual Network Service Endpoints

Use Azure OpenAI with large datasets

Article • 02/17/2023 • 5 minutes to read


Azure OpenAI can be used to solve a large number of natural language tasks through prompting the completion API. To make it easier to scale your prompting workflows from a few examples to large datasets of examples, we have integrated the Azure OpenAI service with the distributed machine learning library SynapseML . This integration makes it easy to use the Apache Spark distributed computing framework to process millions of prompts with the OpenAI service. This tutorial shows how to apply large language models at a distributed scale using Azure Open AI and Azure Synapse Analytics.


Prerequisites

An Azure subscription - Create one for free


Access granted to Azure OpenAI in the desired Azure subscription


Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.

An Azure OpenAI resource – create a resource


An Apache Spark cluster with SynapseML installed - create a serverless Apache Spark pool here

pyspark

We recommend creating a Synapse workspace, but an Azure Databricks, HDInsight, or

Spark on Kubernetes, or even a Python environment with the work.


Import this guide as a notebook

package, will also


The next step is to add this code into your Spark cluster. You can either create a notebook in your Spark platform and copy the code into this notebook to run the demo, or download the notebook and import it into Synapse Analytics.

  1. Download this demo as a notebook (click Raw, then save the file)

  2. Import the notebook into the Synapse Workspace or, if using Databricks, into the Databricks Workspace

  3. Install SynapseML on your cluster. See the installation instructions for Synapse at the bottom of the SynapseML website . This requires pasting another cell at the top of the notebook you imported

  4. Connect your notebook to a cluster and follow along, editing and running the cells below.


Fill in your service information

key

Next, edit the cell in the notebook to point to your service. In particular, set the

resource_name , deployment_name , location , and values for your Azure OpenAI resource.

variables to the corresponding


Important

Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.


Python


import os


# Replace the following values with your Azure OpenAI resource information resource_name = "RESOURCE_NAME" # The name of your Azure OpenAI

resource.

deployment_name = "DEPLOYMENT_NAME" # The name of your Azure OpenAI deployment.

location = "RESOURCE_LOCATION" # The location or region ID for your resource.

key = "RESOURCE_API_KEY" # The key for your resource.


assert key is not None and resource_name is not None


Create a dataset of prompts

Next, create a dataframe consisting of a series of rows, with one prompt per row.


You can also load data directly from Azure Data Lake Storage (ADLS) or other databases. For more information about loading and preparing Spark dataframes, see the Apache Spark data loading guide .


Python



df = spark.createDataFrame( [

("Hello my name is",),

("The best code is code that's",), ("SynapseML is ",),

]

).toDF("prompt")


Create the OpenAICompletion Apache Spark client

To apply the OpenAI Completion service to the dataframe that you just created, create

OpenAICompletion

an object that serves as a distributed client. Parameters of the service

can be set either with a single value, or by a column of the dataframe with the

OpenAICompletion

maxTokens

appropriate setters on the object. Here, we're setting to

promptCol

200. A token is around four characters, and this limit applies to the sum of the prompt

and the result. We're also setting the column in the dataframe.

parameter with the name of the prompt


Python


from synapse.ml.cognitive import OpenAICompletion


completion = (

OpenAICompletion()

.setSubscriptionKey(key)

.setDeploymentName(deployment_name)

.setUrl("https://{}.openai.azure.com/".format(resource_name))

.setMaxTokens(200)

.setPromptCol("prompt")

.setErrorCol("error")

.setOutputCol("completions")

)


Transform the dataframe with the OpenAICompletion client

Now that you have the dataframe and the completion client, you can transform your

completions

input dataset and add a column called with all of the information the

service adds. We'll select out just the text for simplicity.


Python



from pyspark.sql.functions import col


completed_df = completion.transform(df).cache() display(completed_df.select(

col("prompt"), col("error"),

col("completions.choices.text").getItem(0).alias("text")))


Your output should look something like the following example; note that the completion text can vary.


prompt error text

Hello my name is undefined Makaveli I'm eighteen years old and I want to

be a rapper when I grow up I love writing and making music I'm from Los

Angeles, CA

The best code is undefined understandable This is a subjective statement, code that's and there is no definitive answer.

SynapseML is undefined A machine learning algorithm that is able to learn how to

predict the future outcome of events.


Other usage examples

Improve throughput with request batching

The example above makes several requests to the service, one for each prompt. To complete multiple prompts in a single request, use batch mode. First, in the

OpenAICompletion

object, instead of setting the Prompt column to "Prompt", specify "batchPrompt" for the BatchPrompt column. To do so, create a dataframe with a list of prompts per row.


Note

There is currently a limit of 20 prompts in a single request and a limit of 2048 "tokens", or approximately 1500 words.


Python


batch_df = spark.createDataFrame( [

(["The time has come", "Pleased to", "Today stocks", "Here's to"],),

(["The only thing", "Ask not what", "Every litter", "I am"],),


]

).toDF("batchPrompt")


OpenAICompletion

Next we create the object. Rather than setting the prompt column, set

the batchPrompt column if your column is of type Array[String] .


Python


batch_completion = (

OpenAICompletion()

.setSubscriptionKey(key)

.setDeploymentName(deployment_name)

.setUrl("https://{}.openai.azure.com/".format(resource_name))

.setMaxTokens(200)

.setBatchPromptCol("batchPrompt")

.setErrorCol("error")

.setOutputCol("completions")

)

In the call to transform, a request will then be made per row. Because there are multiple prompts in a single row, each request will be sent with all prompts in that row. The results will contain a row for each row in the request.


Python


completed_batch_df = batch_completion.transform(batch_df).cache() display(completed_batch_df)


Note

There is currently a limit of 20 prompts in a single request and a limit of 2048 "tokens", or approximately 1500 words.


Using an automatic mini-batcher

If your data is in column format, you can transpose it to row format using SynapseML's

FixedMiniBatcherTransformer .


Python


from pyspark.sql.types import StringType

from synapse.ml.stages import FixedMiniBatchTransformer from synapse.ml.core.spark import FluentAPI


completed_autobatch_df = (df


.coalesce(1) # Force a single partition so that our little 4-row dataframe makes a batch of size 4, you can remove this step for large datasets

.mlTransform(FixedMiniBatchTransformer(batchSize=4))

.withColumnRenamed("prompt", "batchPrompt")

.mlTransform(batch_completion))


display(completed_autobatch_df)


Prompt engineering for translation

Azure OpenAI can solve many different natural language tasks through prompt engineering. Here, we show an example of prompting for language translation:


Python


translate_df = spark.createDataFrame( [

("Japanese: Ookina hako \nEnglish: Big box \nJapanese: Midori tako\nEnglish:",),

("French: Quelle heure est-il à Montréal? \nEnglish: What time is it in Montreal? \nFrench: Où est le poulet? \nEnglish:",),

]

).toDF("prompt")


display(completion.transform(translate_df))


Prompt for question answering

Here, we prompt the GPT-3 model for general-knowledge question answering:


Python


qa_df = spark.createDataFrame( [

(

"Q: Where is the Grand Canyon?\nA: The Grand Canyon is in

Arizona.\n\nQ: What is the weight of the Burj Khalifa in kilograms?\nA:",

)

]

).toDF("prompt")


display(completion.transform(qa_df))



Additional resources

Azure OpenAI Service encryption of data at rest

Article • 04/21/2023


Azure OpenAI automatically encrypts your data when it's persisted to the cloud. The encryption protects your data and helps you meet your organizational security and compliance commitments. This article covers how Azure OpenAI handles encryption of data at rest, specifically training data and fine-tuned models. For information on how data provided by you to the service is processed, used, and stored, consult the data, privacy, and security article.


About Cognitive Services encryption

Azure OpenAI is part of Azure Cognitive Services. Cognitive Services data is encrypted and decrypted using FIPS 140-2 compliant 256-bit AES encryption. Encryption and decryption are transparent, meaning encryption and access are managed for you. Your data is secure by default and you don't need to modify your code or applications to take advantage of encryption.


About encryption key management

By default, your subscription uses Microsoft-managed encryption keys. There's also the option to manage your subscription with your own keys called customer-managed keys (CMK). CMK offers greater flexibility to create, rotate, disable, and revoke access controls. You can also audit the encryption keys used to protect your data.


Customer-managed keys with Azure Key Vault

Customer-managed keys (CMK), also known as Bring your own key (BYOK), offer greater flexibility to create, rotate, disable, and revoke access controls. You can also audit the encryption keys used to protect your data.

You must use Azure Key Vault to store your customer-managed keys. You can either create your own keys and store them in a key vault, or you can use the Azure Key Vault APIs to generate keys. The Cognitive Services resource and the key vault must be in the same region and in the same Azure Active Directory (Azure AD) tenant, but they can be in different subscriptions. For more information about Azure Key Vault, see What is Azure Key Vault?.

To request the ability to use customer-managed keys, fill out and submit the Cognitive Services Customer-Managed Key Request Form . It will take approximately 3-5 business days to hear back on the status of your request.

To enable customer-managed keys, you must also enable both the Soft Delete and Do Not Purge properties on the key vault.

Only RSA keys of size 2048 are supported with Cognitive Services encryption. For more information about keys, see Key Vault keys in About Azure Key Vault keys, secrets and certificates.


Enable customer-managed keys for your resource

To enable customer-managed keys in the Azure portal, follow these steps:


  1. Go to your Cognitive Services resource.

  2. On the left, select Encryption.

  3. Under Encryption type, select Customer Managed Keys, as shown in the following screenshot.



Specify a key

After you enable customer-managed keys, you can specify a key to associate with the Cognitive Services resource.


Specify a key as a URI

To specify a key as a URI, follow these steps:

  1. In the Azure portal, go to your key vault.


  2. Under Settings, select Keys.


  3. Select the desired key, and then select the key to view its versions. Select a key version to view the settings for that version.

  4. Copy the Key Identifier value, which provides the URI.



  5. Go back to your Cognitive Services resource, and then select Encryption.


  6. Under Encryption key, select Enter key URI.


  7. Paste the URI that you copied into the Key URI box.


  8. Under Subscription, select the subscription that contains the key vault.


  9. Save your changes.


Specify a key from a key vault

To specify a key from a key vault, first make sure that you have a key vault that contains a key. Then follow these steps:

  1. Go to your Cognitive Services resource, and then select Encryption.


  2. Under Encryption key, select Select from Key Vault.


  3. Select the key vault that contains the key that you want to use.


  4. Select the key that you want to use.


  5. Save your changes.


Update the key version

When you create a new version of a key, update the Cognitive Services resource to use the new version. Follow these steps:

  1. Go to your Cognitive Services resource, and then select Encryption.

  2. Enter the URI for the new key version. Alternately, you can select the key vault and then select the key again to update the version.

  3. Save your changes.


Use a different key

To change the key that you use for encryption, follow these steps:


  1. Go to your Cognitive Services resource, and then select Encryption.

  2. Enter the URI for the new key. Alternately, you can select the key vault and then select a new key.

  3. Save your changes.


Rotate customer-managed keys

You can rotate a customer-managed key in Key Vault according to your compliance policies. When the key is rotated, you must update the Cognitive Services resource to use the new key URI. To learn how to update the resource to use a new version of the key in the Azure portal, see Update the key version.

Rotating the key doesn't trigger re-encryption of data in the resource. No further action is required from the user.


Revoke a customer-managed key

You can revoke a customer-managed encryption key by changing the access policy, by changing the permissions on the key vault, or by deleting the key.

To change the access policy of the managed identity that your registry uses, run the az-keyvault-delete-policy command:


Azure CLI


az keyvault delete-policy \

--resource-group <resource-group-name> \

--name <key-vault-name> \

--key_id <key-vault-key-id>

To delete the individual versions of a key, run the az-keyvault-key-delete command. This operation requires the keys/delete permission.


Azure CLI


az keyvault key delete \

--name <key-vault-name> \

--object-id $identityPrincipalID \


Important

Revoking access to an active customer-managed key while CMK is still enabled will prevent downloading of training data and results files, fine-tuning new models, and deploying fine-tuned models. However, previously deployed fine-tuned models will continue to operate and serve traffic until those deployments are deleted.


Delete training, validation, and training results data

The Files API allows customers to upload their training data for the purpose of fine-tuning a model. This data is stored in Azure Storage, within the same region as the resource and logically isolated with their Azure subscription and API Credentials.

Uploaded files can be deleted by the user via the DELETE API operation.

Delete fine-tuned models and deployments

The Fine-tunes API allows customers to create their own fine-tuned version of the OpenAI models based on the training data that you've uploaded to the service via the Files APIs. The trained fine-tuned models are stored in Azure Storage in the same region, encrypted at rest (either with Microsoft-managed keys or customer-managed keys) and logically isolated with their Azure subscription and API credentials. Fine-tuned models and deployments can be deleted by the user by calling the DELETE API operation.


Disable customer-managed keys

When you disable customer-managed keys, your Cognitive Services resource is then encrypted with Microsoft-managed keys. To disable customer-managed keys, follow these steps:

  1. Go to your Cognitive Services resource, and then select Encryption.

  2. Select Microsoft Managed Keys > Save.


When you previously enabled customer managed keys this also enabled a system assigned managed identity, a feature of Azure AD. Once the system assigned managed identity is enabled, this resource will be registered with Azure Active Directory. After being registered, the managed identity will be given access to the Key Vault selected during customer managed key setup. You can learn more about Managed Identities.


Important

If you disable system assigned managed identities, access to the key vault will be removed and any data encrypted with the customer keys will no longer be accessible. Any features depended on this data will stop working.


Important

Managed identities do not currently support cross-directory scenarios. When you configure customer-managed keys in the Azure portal, a managed identity is automatically assigned under the covers. If you subsequently move the subscription, resource group, or resource from one Azure AD directory to another, the managed identity associated with the resource is not transferred to the new tenant, so customer-managed keys may no longer work. For more information, see Transferring a subscription between Azure AD directories in FAQs and known

issues with managed identities for Azure resources.

Next steps

Language service Customer-Managed Key Request Form Learn more about Azure Key Vault

Business Continuity and Disaster Recovery (BCDR) considerations with Azure OpenAI Service

Article • 02/17/2023 • 2 minutes to read


Azure OpenAI is available in two regions. Since subscription keys are region bound, when a customer acquires a key, they select the region in which their deployments will reside and from then on, all operations stay associated with that Azure server region.

It's rare, but not impossible, to encounter a network issue that hits an entire region. If your service needs to always be available, then you should design it to either fail-over into another region or split the workload between two or more regions. Both approaches require at least two OpenAI resources in different regions. This article provides general recommendations for how to implement Business Continuity and Disaster Recovery (BCDR) for your Azure OpenAI applications.


Best practices

Today customers will call the endpoint provided during deployment for both deployments and inference. These operations are stateless, so no data is lost in the case that a region becomes unavailable.

If a region is non-operational customers must take steps to ensure service continuity.


Business continuity

The following set of instructions applies both customers using default endpoints and those using custom endpoints.


Default endpoint recovery

If you're using a default endpoint, you should configure your client code to monitor errors, and if the errors persist, be prepared to redirect to another region of your choice where you have an Azure OpenAI subscription.

Follow these steps to configure your client to monitor errors:


  1. Use this page to identify the list of available regions for the OpenAI service.

  2. Select a primary and one secondary/backup regions from the list.


  3. Create OpenAI Service resources for each region selected


  4. For the primary region and any backup regions your code will need to know:


    1. Base URI for the resource


    2. Regional access key or Azure Active Directory access


  5. Configure your code so that you monitor connectivity errors (typically connection timeouts and service unavailability errors).

    1. Given that networks yield transient errors, for single connectivity issue occurrences, the suggestion is to retry.

    2. For persistence redirect traffic to the backup resource in the region you've created.


BCDR requires custom code

The recovery from regional failures for this usage type can be performed instantaneously and at a very low cost. This does however, require custom development of this functionality on the client side of your application.



Additional resources

Monitoring Azure OpenAI Service

Article • 03/15/2023 • 5 minutes to read


When you have critical applications and business processes relying on Azure resources, you want to monitor those resources for their availability, performance, and operation.

This article describes the monitoring data generated by Azure OpenAI Service. Azure OpenAI is part of Cognitive Services, which uses Azure Monitor. If you're unfamiliar with the features of Azure Monitor common to all Azure services that use it, read Monitoring Azure resources with Azure Monitor.


Monitoring data

Azure OpenAI collects the same kinds of monitoring data as other Azure resources that are described in Monitoring data from Azure resources.


Collection and routing

Platform metrics and the Activity log are collected and stored automatically, but can be routed to other locations by using a diagnostic setting.

Resource Logs aren't collected and stored until you create a diagnostic setting and route them to one or more locations.

See Create diagnostic setting to collect platform logs and metrics in Azure for the detailed process for creating a diagnostic setting using the Azure portal, CLI, or PowerShell. When you create a diagnostic setting, you specify which categories of logs to collect.

Keep in mind that using diagnostic settings and sending data to Azure Monitor Logs has additional costs associated with it. To understand more, consult the Azure Monitor cost calculation guide.

The metrics and logs you can collect are discussed in the following sections.


Analyzing metrics

You can analyze metrics for Azure OpenAI by opening Metrics which can be found underneath the Monitoring section when viewing your Azure OpenAI resource in the Azure portal. See Getting started with Azure Metrics Explorer for details on using this tool.

Azure OpenAI is a part of Cognitive Services. For a list of all platform metrics collected for Cognitive Services and Azure OpenAI, see Cognitive Services supported metrics.

For the current subset of metrics available in Azure OpenAI:


Azure OpenAI Metrics


Metric

Exportable

Metric

Unit

Aggregation

Description

Dimensions


via

Display


Type




Diagnostic

Name






Settings?






BlockedCalls

Yes

Blocked Calls

Count

Total

Number of calls that exceeded rate or quota limit.

ApiName, OperationName, Region, RatelimitKey

ClientErrors

Yes

Client Errors

Count

Total

Number of calls with client side error (HTTP response code 4xx).

ApiName, OperationName, Region, RatelimitKey

DataIn

Yes

Data In

Bytes

Total

Size of

ApiName,






incoming

OperationName, Region






data in







bytes.


DataOut

Yes

Data Out

Bytes

Total

Size of

ApiName,






outgoing

OperationName, Region






data in







bytes.


FineTunedTrainingHours

Yes

Processed FineTuned Training Hours

Count

Total

Number of Training Hours Processed on an OpenAI FineTuned Model

ApiName, ModelDeploymentName, FeatureName, UsageChannel, Region

Latency

Yes

Latency

MilliSeconds

Average

Latency in milliseconds.

ApiName, OperationName, Region, RatelimitKey

Ratelimit

Yes

Ratelimit

Count

Total

The current

Region, RatelimitKey






ratelimit of







the ratelimit







key.


ServerErrors

Yes

Server Errors

Count

Total

Number of calls with service internal error (HTTP response code 5xx).

ApiName, OperationName, Region, RatelimitKey

SuccessfulCalls

Yes

Successful Calls

Count

Total

Number of successful calls.

ApiName, OperationName, Region, RatelimitKey



Metric

Exportable

Metric

Unit

Aggregation

Description

Dimensions


via

Display


Type




Diagnostic

Name






Settings?






TokenTransaction

Yes

Processed Inference Tokens

Count

Total

Number of Inference Tokens Processed on an OpenAI Model

ApiName, ModelDeploymentName, FeatureName, UsageChannel, Region

TotalCalls

Yes

Total Calls

Count

Total

Total number of calls.

ApiName, OperationName, Region, RatelimitKey

TotalErrors

Yes

Total Errors

Count

Total

Total number of calls with error response (HTTP

response code 4xx or 5xx).

ApiName, OperationName, Region, RatelimitKey


Analyzing logs

Data in Azure Monitor Logs is stored in tables where each table has its own set of unique properties.


All resource logs in Azure Monitor have the same fields followed by service-specific fields. The common schema is outlined in Azure Monitor resource log schema.

The Activity log is a type of platform log in Azure that provides insight into subscription-level events. You can view it independently or route it to Azure Monitor Logs, where you can do much more complex queries using Log Analytics.

For a list of the types of resource logs available for Azure OpenAI and other Cognitive Services, see Resource provider operations for Cognitive Services


Kusto queries


Important

When you select Logs from the Azure OpenAI menu, Log Analytics is opened with the query scope set to the current Azure OpenAI resource. This means that log queries will only include data from that resource. If you want to run a query that includes data from other resources or data from other Azure services, select Logs from the Azure Monitor menu. See Log query scope and time range in Azure Monitor Log Analytics for details.

To explore and get a sense of what type of information is available for your Azure OpenAI resource a useful query to start with once you have deployed a model and sent some completion calls through the playground is as follows:



AzureDiagnostics

| take 100

| project TimeGenerated, _ResourceId, Category,OperationName, DurationMs, ResultSignature, properties_s

Kusto


Here we return a sample of 100 entries and are displaying a subset of the available columns of data in the logs. The results are as follows:



If you wish to see all available columns of data, you can remove the scoping that is provided by the |

project

line:



AzureDiagnostics

| take 100

Kusto


You can also select the arrow next to the table name to view all available columns and associated data types.

To examine AzureMetrics run:



AzureMetrics

| take 100

| project TimeGenerated, MetricName, Total, Count, TimeGrain, UnitName

Kusto


Alerts

Azure Monitor alerts proactively notify you when important conditions are found in your monitoring data. They allow you to identify and address issues in your system before your customers notice them. You can set alerts on metrics, logs, and the activity log. Different types of alerts have different benefits and drawbacks.

Every organization's alerting needs are going to vary, and will also evolve over time. Generally all alerts should be actionable, with a specific intended response if the alert occurs. If there's no action for someone to take, then it might be something you want to capture in a report, but not in an alert. Some use cases may require alerting anytime certain error conditions exist. But in many environments, it might only be in cases where errors exceed a certain threshold for a period of time where sending an alert is warranted.

Errors below certain thresholds can often be evaluated through regular analysis of data in Azure Monitor Logs. As you analyze your log data over time, you may also find that a certain condition not occurring for a long enough period of time might be valuable to track with alerts. Sometimes the absence of an event in a log is just as important a signal as an error.

Depending on what type of application you're developing in conjunction with your use of Azure OpenAI, Azure Monitor Application Insights may offer additional monitoring benefits at the application layer.


Next steps

See Monitoring Azure resources with Azure Monitor for details on monitoring Azure resources. Read Understand log searches in Azure Monitor logs.



Additional resources


Documentation

Plan to manage costs for Azure OpenAI Service - Azure Cognitive Services

Learn how to plan for and manage costs for Azure OpenAI by using cost analysis in the Azure portal.

Azure OpenAI Service encryption of data at rest - Azure Cognitive Services

Learn how Azure OpenAI encrypts your data when it's persisted to the cloud.

Models - REST API (Azure Cognitive Services)

Learn more about [Cognitive Services Models Operations]. How to [Get,List].

How to Configure Azure OpenAI Service with Managed Identities - Azure OpenAI

Provides guidance on how to set managed identity with Azure Active Directory

Azure OpenAI Service content filtering - Azure OpenAI

Learn about the content filtering capabilities of Azure OpenAI in Azure Cognitive Services

How-to - Use Azure OpenAI Service with large datasets - Azure OpenAI

Walkthrough on how to integrate Azure OpenAI with SynapseML and Apache Spark to apply large language models at a distributed scale.

Deployments - REST API (Azure Cognitive Services)

Learn more about [Cognitive Services Deployments Operations]. How to [Create,Delete,Get,List,Update].

Fine Tunes - REST API (Azure Cognitive Services)

Learn more about [Cognitive Services Fine Tunes Operations]. How to [Cancel,Create,Delete,Get,Get Events,List].

Show 5 more

Plan to manage costs for Azure OpenAI Service

Article • 04/05/2023 • 7 minutes to read


This article describes how you plan for and manage costs for Azure OpenAI Service. Before you deploy the service, you can use the Azure pricing calculator to estimate costs for Azure OpenAI. Later, as you deploy Azure resources, review the estimated costs.

After you've started using Azure OpenAI resources, use Cost Management features to set budgets and monitor costs. You can also review forecasted costs and identify spending trends to identify areas where you might want to act. Costs for Azure OpenAI Service are only a portion of the monthly costs in your Azure bill. Although this article explains how to plan for and manage costs for Azure OpenAI, you're billed for all Azure services and resources used in your Azure subscription, including the third-party services.


Prerequisites

Cost analysis in Cost Management supports most Azure account types, but not all of them. To view the full list of supported account types, see Understand Cost Management data. To view cost data, you need at least read access for an Azure account. For information about assigning access to Azure Cost Management data, see Assign access to data.


Estimate costs before using Azure OpenAI

Use the Azure pricing calculator to estimate the costs of using Azure OpenAI.


Understand the full billing model for Azure OpenAI Service

Azure OpenAI Service runs on Azure infrastructure that accrues costs when you deploy new resources. It's important to understand that there could be other additional infrastructure costs that might accrue.


How you're charged for Azure OpenAI Service


Base series and Codex series models

Azure OpenAI base series and Codex series models are charged per 1,000 tokens. Costs vary depending on which model series you choose: Ada, Babbage, Curie, Davinci, or Code-Cushman.

Our models understand and process text by breaking it down into tokens. For reference, each token is roughly four characters for typical English text.

Token costs are for both input and output. For example, if you have a 1,000 token JavaScript code sample that you ask an Azure OpenAI model to convert to Python. You would be charged approximately 1,000 tokens for the initial input request sent, and 1,000 more tokens for the output that is received in response for a total of 2,000 tokens.

In practice, for this type of completion call the token input/output wouldn't be perfectly 1:1. A conversion from one programming language to another could result in a longer or shorter output depending on many different factors including the value assigned to the max_tokens parameter.


Base Series and Codex series fine-tuned models

Azure OpenAI fine-tuned models are charged based on three factors: Training hours

Hosting hours

Inference per 1,000 tokens


The hosting hours cost is important to be aware of since once a fine-tuned model is deployed it continues to incur an hourly cost regardless of whether you're actively using it. Fine-tuned model costs should be monitored closely.


Important

After a customized model is deployed, if at any time the deployment remains inactive for greater than fifteen (15) days, the deployment will automatically be deleted. The deployment of a customized model is “inactive” if the model was deployed more than fifteen (15) days ago and no completions or chat completions calls were made to it during a continuous 15-day period. The deletion of an inactive deployment does NOT delete or affect the underlying customized model, and the customized model can be redeployed at any time. As described in Azure OpenAI

Service pricing , each customized (fine-tuned) model that is deployed incurs an hourly hosting cost regardless of whether completions or chat completions calls are being made to the model. To learn more about planning and managing costs with Azure OpenAI, refer to our cost management guide.

Other costs that might accrue with Azure OpenAI Service

Keep in mind that enabling capabilities like sending data to Azure Monitor Logs, alerting, etc. incurs additional costs for those services. These costs are visible under those other services and at the subscription level, but aren't visible when scoped just to your Azure OpenAI resource.


Using Azure Prepayment with Azure OpenAI Service

You can pay for Azure OpenAI Service charges with your Azure Prepayment credit. However, you can't use Azure Prepayment credit to pay for charges for third party products and services including those from the Azure Marketplace.


Monitor costs

As you use Azure resources with Azure OpenAI, you incur costs. Azure resource usage unit costs vary by time intervals (seconds, minutes, hours, and days) or by unit usage (bytes, megabytes, and so on.) As soon as Azure OpenAI use starts, costs can be incurred and you can see the costs in cost analysis.

When you use cost analysis, you view Azure OpenAI costs in graphs and tables for different time intervals. Some examples are by day, current and prior month, and year. You also view costs against budgets and forecasted costs. Switching to longer views over time can help you identify spending trends. And you see where overspending might have occurred. If you've created budgets, you can also easily see where they're exceeded.

To view Azure OpenAI costs in cost analysis:


  1. Sign in to the Azure portal.

  2. Select one of your Azure OpenAI resources.

  3. Under Resource Management select Cost analysis

  4. By default cost analysis is scoped to the individual Azure OpenAI resource.


To understand the breakdown of what makes up that cost, it can help to modify Group by to Meter and in this case switching the chart type to Line. You can now see that for this particular resource the source of the costs is from three different model series with Text-Davinci Tokens representing the bulk of the costs.



It's important to understand scope when evaluating costs associated with Azure OpenAI. If your resources are part of the same resource group you can scope Cost Analysis at that level to understand the effect on costs. If your resources are spread across multiple resource groups you can scope to the subscription level.

However, when scoped at a higher level you often need to add additional filters to be able to zero in on Azure OpenAI usage. When scoped at the subscription level we see a number of other resources that we may not care about in the context of Azure OpenAI cost management. When scoping at the subscription level, we recommend navigating to the full Cost analysis tool under the Cost Management service. Search for "Cost Management" in the top Azure search bar to navigate to the full service experience, which includes more options like creating budgets.


If you try to add a filter by service, you'll find that you can't find Azure OpenAI in the list. This is because technically Azure OpenAI is part of Cognitive Services so the service level filter is Cognitive Services, but if you want to see all Azure OpenAI resources across a subscription without any other type of Cognitive Services resources you need to instead scope to Service tier: Azure OpenAI:



Create budgets

You can create budgets to manage costs and create alerts that automatically notify stakeholders of spending anomalies and overspending risks. Alerts are based on spending compared to budget and cost thresholds. Budgets and alerts are created for Azure subscriptions and resource groups, so they're useful as part of an overall cost monitoring strategy.

Budgets can be created with filters for specific resources or services in Azure if you want more granularity present in your monitoring. Filters help ensure that you don't accidentally create new resources that cost you additional money. For more information about the filter options available when you create a budget, see Group and filter options.


Important

While OpenAI has an option for hard limits that will prevent you from going over your budget, Azure OpenAI does not currently provide this functionality. You are able to kick off automation from action groups as part of your budget notifications to take more advanced actions, but this requires additional custom development on your part.


Export cost data

You can also export your cost data to a storage account. This is helpful when you need or others to do additional data analysis for costs. For example, a finance team can analyze the data using Excel or Power BI. You can export your costs on a daily, weekly, or monthly schedule and set a custom date range. Exporting cost data is the recommended way to retrieve cost datasets.


Next steps

Learn how to optimize your cloud investment with Azure Cost Management. Learn more about managing costs with cost analysis.

Learn about how to prevent unexpected costs.

Take the Cost Management guided learning course.

Tutorial: Explore Azure OpenAI Service embeddings and document search

Article • 04/25/2023


This tutorial will walk you through using the Azure OpenAI embeddings API to perform document search where you'll query a knowledge base to find the most relevant document.

In this tutorial, you learn how to:


Install Azure OpenAI and other dependent Python libraries.

Download the BillSum dataset and prepare it for analysis.

Create environment variables for your resources endpoint and API key.

Use the text-embedding-ada-002 (Version 2) model

Use cosine similarity to rank search results.


Important

We strongly recommend using text-embedding-ada-002 (Version 2) . This model/version provides parity with OpenAI's text-embedding-ada-002 . To learn more about the improvements offered by this model, please refer to OpenAI's blog post . Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.


Prerequisites

An Azure subscription - Create one for free

Access granted to Azure OpenAI in the desired Azure subscription Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.

Python 3.7.1 or later version

The following Python libraries: openai, num2words, matplotlib, plotly, scipy, scikit-learn, pandas, tiktoken.

Jupyter Notebooks

An Azure OpenAI resource with the text-embedding-ada-002 (Version 2) model deployed. This model is currently only available in certain regions. If you don't have a resource the process of creating one is documented in our resource deployment guide.


Set up

Python libraries

If you haven't already, you need to install the following libraries:


Windows Command Prompt


pip install openai num2words matplotlib plotly scipy scikit-learn pandas tiktoken


Download the BillSum dataset

BillSum is a dataset of United States Congressional and California state bills. For illustration purposes, we'll look only at the US bills. The corpus consists of bills from the 103rd-115th (1993-2018) sessions of Congress. The data was split into 18,949 train bills and 3,269 test bills. The BillSum corpus focuses on mid-length legislation from 5,000 to 20,000 characters in length. More information on the project and the original academic paper where this dataset is derived from can be found on the BillSum project's GitHub repository


bill_sum_data.csv

This tutorial uses the sample data .

file that can be downloaded from our GitHub


You can also download the sample data by running the following command on your local machine:


Windows Command Prompt


curl "https://raw.githubusercontent.com/Azure-Samples/Azure-OpenAI-Docs-

Samples/main/Samples/Tutorials/Embeddings/data/bill_sum_data.csv" --output bill_sum_data.csv


Retrieve key and endpoint

To successfully make a call against Azure OpenAI, you'll need an endpoint and a key.

Variable Value name

ENDPOINT This value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. Alternatively, you can find the value in Azure OpenAI Studio > Playground > Code View. An example endpoint is: https://docs-test-001.openai.azure.com .

API-KEY This value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. You can use either KEY1 or KEY2 .


Go to your resource in the Azure portal. The Endpoint and Keys can be found in the

Resource Management section. Copy your endpoint and access key as you'll need both

KEY1

for authenticating your API calls. You can use either or KEY2 . Always having two

keys allows you to securely rotate and regenerate keys without causing a service disruption.

Create and assign persistent environment variables for your key and endpoint.


Environment variables


Command Line


setx AZURE_OPENAI_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"

CMD


setx AZURE_OPENAI_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"

CMD


print(dataframe_name)

dataframe_name

After setting the environment variables, you may need to close and reopen Jupyter notebooks or whatever IDE you're using in order for the environment variables to be accessible. While we strongly recommend using Jupyter Notebooks, if for some reason you cannot you'll need to modify any code that is returning a pandas dataframe by

using

rather than just calling the

directly as is

often done at the end of a code block.


Run the following code in your preferred Python IDE:


Import libraries and list models


Python


import openai import os

import re

import requests import sys

from num2words import num2words import os

import pandas as pd import numpy as np

from openai.embeddings_utils import get_embedding, cosine_similarity import tiktoken


API_KEY = os.getenv("AZURE_OPENAI_API_KEY")

RESOURCE_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")


openai.api_type = "azure" openai.api_key = API_KEY

openai.api_base = RESOURCE_ENDPOINT openai.api_version = "2022-12-01"


url = openai.api_base + "/openai/deployments?api-version=2022-12-01" r = requests.get(url, headers={"api-key": API_KEY})

print(r.text)


Output


{

"data": [

{

"scale_settings": {

"scale_type": "standard"

},

"model": "text-embedding-ada-002", "owner": "organization-owner",

"id": "text-embedding-ada-002",

"status": "succeeded",

"created_at": 1657572678,

"updated_at": 1657572678, "object": "deployment"

},

{

"scale_settings": {

"scale_type": "standard"

},

"model": "code-cushman-001",

"owner": "organization-owner", "id": "code-cushman-001",

"status": "succeeded",

"created_at": 1657572712,


"updated_at": 1657572712, "object": "deployment"

},

{

"scale_settings": {

"scale_type": "standard"

},

"model": "text-search-curie-doc-001", "owner": "organization-owner",

"id": "text-search-curie-doc-001", "status": "succeeded",

"created_at": 1668620345,

"updated_at": 1668620345, "object": "deployment"

},

{

"scale_settings": {

"scale_type": "standard"

},

"model": "text-search-curie-query-001", "owner": "organization-owner",

"id": "text-search-curie-query-001", "status": "succeeded",

"created_at": 1669048765,

"updated_at": 1669048765, "object": "deployment"

}

],

"object": "list"

}


The output of this command will vary based on the number and type of models you've deployed. In this case, we need to confirm that we have an entry for text-embedding-ada-002. If you find that you're missing this model, you'll need to deploy the model to your resource before proceeding.

Now we need to read our csv file and create a pandas DataFrame. After the initial DataFrame is created, we can view the contents of the table by running df .


Python


df=pd.read_csv(os.path.join(os.getcwd(),'bill_sum_data.csv')) # This assumes that you have placed the bill_sum_data.csv in the same directory you are

running Jupyter Notebooks df

Output:


The initial table has more columns than we need we'll create a new smaller DataFrame

df_bills

called which will contain only the columns for text , summary , and title .


Python


df_bills = df[['text', 'summary', 'title']] df_bills


Output:



Next we'll perform some light data cleaning by removing redundant whitespace and cleaning up the punctuation to prepare the data for tokenization.


Python


pd.options.mode.chained_assignment = None #https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#evaluation-order-matters


# s is input text

def normalize_text(s, sep_token = " \n "): s = re.sub(r'\s+', ' ', s).strip()

s = re.sub(r". ,","",s)

# remove all instances of multiple spaces s = s.replace("..",".")

s = s.replace(". .",".")

s = s.replace("\n", "") s = s.strip()


return s


df_bills['text']= df_bills["text"].apply(lambda x : normalize_text(x))


Now we need to remove any bills that are too long for the token limit (8192 tokens).


Python


tokenizer = tiktoken.get_encoding("cl100k_base")

df_bills['n_tokens'] = df_bills["text"].apply(lambda x: len(tokenizer.encode(x)))

df_bills = df_bills[df_bills.n_tokens<8192] len(df_bills)


Output


20


Note

In this case all bills are under the embedding model input token limit, but you can use the technique above to remove entries that would otherwise cause embedding to fail. When faced with content that exceeds the embedding limit, you can also chunk the content into smaller pieces and then embed those one at a time.

We'll once again examine df_bills.


Python


df_bills

Output:



To understand the n_tokens column a little more as well how text ultimately is tokenized, it can be helpful to run the following code:


Python


sample_encode = tokenizer.encode(df_bills.text[0])

decode = tokenizer.decode_tokens_bytes(sample_encode) decode

For our docs we're intentionally truncating the output, but running this command in your environment will return the full text from index zero tokenized into chunks. You can see that in some cases an entire word is represented with a single token whereas in others parts of words are split across multiple tokens.


Output


[b'SECTION',

b' ',

b'1',

b'.',

b' SHORT',

b' TITLE',

b'.',

b' This',

b' Act',

b' may',

b' be',

b' cited',


b' as',

b' the',

b' ``',

b'National', b' Science',

b' Education', b' Tax',

b' In',

b'cent',

b'ive',

b' for',

b' Businesses', b' Act',

b' of',

b' ',

b'200',

b'7',

b"''.", b' SEC',

b'.',

b' ',

b'2',

b'.',

b' C',

b'RED',

b'ITS',

b' FOR',

b' CERT',

b'AIN',

b' CONTRIBUT', b'IONS',

b' BEN',

b'EF',

b'IT',

b'ING',

b' SC',


decode

If you then check the length of the number in the n_tokens column.

variable, you'll find it matches the first


Python


len(decode)


Output


1466


Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents

n_tokens

yet. The column is simply a way of making sure none of the data we pass to

ada_v2

the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an Azure Database. As a result, each bill

will have its own corresponding embedding vector in the new right side of the DataFrame.

column on the


Python


df_bills['ada_v2'] = df_bills["text"].apply(lambda x : get_embedding(x,

engine = 'text-embedding-ada-002')) # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2)

model


Python


df_bills


Output:



As we run the search code block below, we'll embed the search query "Can I get

information on cable company tax revenue?" with the same text-embedding-ada-002 (Version 2) model. Next we'll find the closest bill embedding to the newly embedded text from our query ranked by cosine similarity.


Python


# search through the reviews for a specific product

def search_docs(df, user_query, top_n=3, to_print=True):


embedding = get_embedding( user_query,

engine="text-embedding-ada-002" # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2) model

)

df["similarities"] = df.ada_v2.apply(lambda x: cosine_similarity(x, embedding))


res = (

df.sort_values("similarities", ascending=False)

.head(top_n)

)

if to_print:

display(res) return res


res = search_docs(df_bills, "Can I get information on cable company tax revenue?", top_n=4)


Output:



Finally, we'll show the top result from document search based on user query against the entire knowledge base. This returns the top result of the "Taxpayer's Right to View Act of 1993". This document has a cosine similarity score of 0.76 between the query and the document:


Python


res["summary"][9]


Output


"Taxpayer's Right to View Act of 1993 - Amends the Communications Act of 1934 to prohibit a cable operator from assessing separate charges for any

video programming of a sporting, theatrical, or other entertainment event if that event is performed at a facility constructed, renovated, or maintained with tax revenues or by an organization that receives public financial

support. Authorizes the Federal Communications Commission and local

franchising authorities to make determinations concerning the applicability of such prohibition. Sets forth conditions under which a facility is

considered to have been constructed, maintained, or renovated with tax

revenues. Considers events performed by nonprofit or public organizations that receive tax subsidies to be subject to this Act if the event is


sponsored by, or includes the participation of a team that is part of, a tax exempt organization."


Using this approach, you can use embeddings as a search mechanism across documents in a knowledge base. The user can then take the top search result and use it for their downstream task, which prompted their initial query.


Clean up resources

If you created an OpenAI resource solely for completing this tutorial and want to clean up and remove an OpenAI resource, you'll need to delete your deployed models, and then delete the resource or associated resource group if it's dedicated to your test resource. Deleting the resource group also deletes any other resources associated with it.


Portal Azure CLI


Next steps

Learn more about Azure OpenAI's models:


Azure OpenAI Service models

Azure OpenAI speech to speech chat

Article • 04/19/2023


Reference documentation | Package (NuGet) | Additional Samples on GitHub


Important

To complete the steps in this guide, access must be granted to Microsoft Azure OpenAI Service in the desired Azure subscription. Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access .


In this how-to guide, you can use Azure Cognitive Services Speech to converse with Azure OpenAI Service. The text recognized by the Speech service is sent to Azure OpenAI. The text response from Azure OpenAI is then synthesized by the Speech service.

Speak into the microphone to start a conversation with Azure OpenAI.


The Speech service recognizes your speech and converts it into text (speech to text).

Your request as text is sent to Azure OpenAI.

The Speech service text to speech (TTS) feature synthesizes the response from Azure OpenAI to the default speaker.

Although the experience of this example is a back-and-forth exchange, Azure OpenAI doesn't remember the context of your conversation.


Prerequisites


Azure subscription - Create one for free

Create a Speech resource in the Azure portal.

Get the Speech resource key and region. After your Speech resource is deployed,

select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.


Set up the environment

The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK later in this guide, but first check the SDK installation guide

for any more requirements.


Set environment variables

This example requires environment variables named OPEN_AI_KEY , OPEN_AI_ENDPOINT , SPEECH_KEY , and SPEECH_REGION .

Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource , write it to a new environment variable on the local machine running the application.


Tip

Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.

To set the environment variables, open a console window, and follow the instructions for your operating system and development environment.


OPEN_AI_KEY

To set the

environment variable, replace

with one of

your-openai-key

OPEN_AI_ENDPOINT

the keys for your resource. To set the


your-openai-endpoint

environment variable, replace

SPEECH_KEY

your-speech-key

with one of the regions for your resource.

To set the

environment variable, replace

with one of

SPEECH_REGION

your-speech-region

the keys for your resource.

To set the

environment variable, replace

with

one of the regions for your resource.


Note

Windows


setx OPEN_AI_KEY your-openai-key

setx OPEN_AI_ENDPOINT your-openai-endpoint setx SPEECH_KEY your-speech-key

setx SPEECH_REGION your-speech-region

Console

If you only need to access the environment variable in the current running console, you can set the environment variable with set instead of setx .


After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.


Recognize speech from a microphone

Follow these steps to create a new console application.


Program.cs

  1. Open a command prompt where you want the new project, and create a console

    application with the .NET CLI. The directory.

    file should be created in the project


    .NET CLI


    dotnet new console


  2. Install the Speech SDK in your new project with the .NET CLI.


    .NET CLI


    dotnet add package Microsoft.CognitiveServices.Speech

  3. Install the Azure OpenAI SDK (prerelease) in your new project with the .NET CLI.


    .NET CLI


    dotnet add package Azure.AI.OpenAI --prerelease


    Program.cs

  4. Replace the contents of with the following code.


    C#


    using System;

    using System.IO;

    using System.Threading.Tasks;

    using Microsoft.CognitiveServices.Speech;

    using Microsoft.CognitiveServices.Speech.Audio; using Azure;

    using Azure.AI.OpenAI;

    using static System.Environment;


    class Program

    {

    // This example requires environment variables named "OPEN_AI_KEY" and "OPEN_AI_ENDPOINT"

    // Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/

    static string openAIKey =

    Environment.GetEnvironmentVariable("OPEN_AI_KEY"); static string openAIEndpoint =

    Environment.GetEnvironmentVariable("OPEN_AI_ENDPOINT");


    // Enter the deployment name you chose when you deployed the model. static string engine = "text-davinci-002";


    // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"

    static string speechKey =

    Environment.GetEnvironmentVariable("SPEECH_KEY"); static string speechRegion =

    Environment.GetEnvironmentVariable("SPEECH_REGION");


    // Prompts Azure OpenAI with a request and synthesizes the response.

    async static Task AskOpenAI(string prompt)

    {

    // Ask Azure OpenAI

    OpenAIClient client = new(new Uri(openAIEndpoint), new AzureKeyCredential(openAIKey));

    var completionsOptions = new CompletionsOptions()

    {

    Prompts = { prompt }, MaxTokens = 100,

    };

    Response<Completions> completionsResponse = client.GetCompletions(engine, completionsOptions);

    string text = completionsResponse.Value.Choices[0].Text.Trim(); Console.WriteLine($"Azure OpenAI response: {text}");


    var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);

    // The language of the voice that speaks.

    speechConfig.SpeechSynthesisVoiceName = "en-US-JennyMultilingualNeural";

    var audioOutputConfig = AudioConfig.FromDefaultSpeakerOutput();


    using (var speechSynthesizer = new

    SpeechSynthesizer(speechConfig, audioOutputConfig))

    {

    var speechSynthesisResult = await

    speechSynthesizer.SpeakTextAsync(text).ConfigureAwait(true);


    if (speechSynthesisResult.Reason == ResultReason.SynthesizingAudioCompleted)

    {

    Console.WriteLine($"Speech synthesized to speaker for text: [{text}]");

    }

    else if (speechSynthesisResult.Reason == ResultReason.Canceled)

    {

    var cancellationDetails =

    SpeechSynthesisCancellationDetails.FromResult(speechSynthesisResult); Console.WriteLine($"Speech synthesis canceled:

    {cancellationDetails.Reason}");


    if (cancellationDetails.Reason == CancellationReason.Error)

    {

    Console.WriteLine($"Error details:

    {cancellationDetails.ErrorDetails}");

    }

    }

    }

    }


    // Continuously listens for speech input to recognize and send as text to Azure OpenAI

    async static Task ChatWithOpenAI()

    {

    // Should be the locale for the speaker's language.

    var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);

    speechConfig.SpeechRecognitionLanguage = "en-US";


    using var audioConfig =

    AudioConfig.FromDefaultMicrophoneInput();

    using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

    var conversationEnded = false;


    while(!conversationEnded)

    {

    Console.WriteLine("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.");


    // Get audio from the microphone and then send it to the TTS service.

    var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();


    switch (speechRecognitionResult.Reason)

    {

    case ResultReason.RecognizedSpeech:

    if (speechRecognitionResult.Text == "Stop.")

    {

    Console.WriteLine("Conversation ended."); conversationEnded = true;

    }


    else

    {

    Console.WriteLine($"Recognized speech:

    {speechRecognitionResult.Text}");

    await

    AskOpenAI(speechRecognitionResult.Text).ConfigureAwait(true);

    }

    break;

    case ResultReason.NoMatch:

    Console.WriteLine($"No speech could be recognized:

    ");

    break;

    case ResultReason.Canceled: var cancellationDetails =

    CancellationDetails.FromResult(speechRecognitionResult);

    Console.WriteLine($"Speech Recognition canceled:

    {cancellationDetails.Reason}");

    if (cancellationDetails.Reason == CancellationReason.Error)

    {

    Console.WriteLine($"Error details=

    {cancellationDetails.ErrorDetails}");

    }

    break;

    }

    }

    }


    async static Task Main(string[] args)

    {

    try

    {

    await ChatWithOpenAI().ConfigureAwait(true);

    }

    catch (Exception ex)

    {

    Console.WriteLine(ex.Message);

    }

    }

    }


    MaxTokens

    CompletionsOptions

  5. To increase or decrease the number of tokens returned by Azure OpenAI, change

the

property in the

class instance. For more

information tokens and cost implications, see Azure OpenAI tokens and Azure OpenAI pricing .

Run your new console application to start speech recognition from a microphone:


Console


dotnet run

Important

Make sure that you set the OPEN_AI_KEY , OPEN_AI_ENDPOINT , SPEECH KEY and SPEECH REGION environment variables as described previously. If you don't set these variables, the sample will fail with an error message.


Speak into your microphone when prompted. The console output includes the prompt for you to begin speaking, then your request as text, and then the response from Azure OpenAI as text. The response from Azure OpenAI should be converted from text to speech and then output to the default speaker.


Console


PS C:\dev\openai\csharp> dotnet run

Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.

Recognized speech:Make a comma separated list of all continents.

Azure OpenAI response:Africa, Antarctica, Asia, Australia, Europe, North America, South America

Speech synthesized to speaker for text [Africa, Antarctica, Asia, Australia, Europe, North America, South America]

Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.

Recognized speech: Make a comma separated list of 1 Astronomical observatory for each continent. A list should include each continent name in

parentheses.

Azure OpenAI response:Mauna Kea Observatories (North America), La Silla

Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft

Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica) Speech synthesized to speaker for text [Mauna Kea Observatories (North

America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong

Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)]

Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.

Conversation ended.

PS C:\dev\openai\csharp>


Remarks

Now that you've completed the quickstart, here are some more considerations:


en-US

To change the speech recognition language, replace with another supported

es-ES

en-US

language. For example, for Spanish (Spain). The default language is if

you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see language identification.

en-US-JennyMultilingualNeural

To change the voice that you hear, replace with

another supported voice. If the voice doesn't speak the language of the text returned from Azure OpenAI, the Speech service doesn't output synthesized audio.

text-davinci-002

To use a different model, replace with the ID of another

deployment. Keep in mind that the deployment ID isn't necessarily the same as the model name. You named your deployment when you created it in Azure OpenAI Studio .

Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses may be filtered if harmful content is detected. For more information, see the content filtering article.


Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.


Next steps

Learn more about Speech

Learn more about Azure OpenAI

Overview of Responsible AI practices for Azure OpenAI models

Article • 05/19/2023


Many of the Azure OpenAI models are generative AI models that have demonstrated improvements in advanced capabilities such as content and code generation, summarization, and search. With many of these improvements also come increased responsible AI challenges related to harmful content, manipulation, human-like behavior, privacy, and more. For more information about the capabilities, limitations and appropriate use cases for these models, please review the Transparency Note.

In addition to the Transparency Note, we have created technical recommendations and resources to help customers design, develop, deploy, and use AI systems that implement the Azure OpenAI models responsibly. Our recommendations are grounded in the Microsoft Responsible AI Standard , which sets policy requirements that our own engineering teams follow. Much of the content of the Standard follows a pattern, asking teams to Identify, Measure, and Mitigate potential harms, and plan for how to Operate the AI system as well. In alignment with those practices, these recommendations are organized into four stages:

  1. Identify : Identify and prioritize potential harms that could result from your AI system through iterative red-teaming, stress-testing, and analysis.

  2. Measure : Measure the frequency and severity of those harms by establishing clear metrics, creating measurement test sets, and completing iterative, systematic testing (both manual and automated).

  3. Mitigate : Mitigate harms by implementing tools and strategies such as prompt engineering and using our content filters. Repeat measurement to test effectiveness after implementing mitigations.

  4. Operate : Define and execute a deployment and operational readiness plan.

In addition to their correspondence to the Microsoft Responsible AI Standard, these stages correspond closely to the functions in the NIST AI Risk Management Framework .


Identify

Identifying potential harms that could occur in or be caused by an AI system is the first stage of the Responsible AI lifecycle. The earlier you begin to identify potential harms, the more effective you can be at mitigating the harms. When assessing potential harms, it is important to develop an understanding of the types of harms that could result from

using the Azure OpenAI Service in your specific context(s). In this section, we provide recommendations and resources you can use to identify harms through an impact assessment, iterative red team testing, stress-testing, and analysis. Red teaming and stress-testing are approaches where a group of testers come together and intentionally probe a system to identify its limitations, risk surface, and vulnerabilities.

These steps have the goal of producing a prioritized list of potential harms for each specific scenario.

  1. Identify harms that are relevant for your specific model, application, and deployment scenario.

    1. Identify potential harms associated with the model and model capabilities (for example, GPT-3 model vs GPT-4 model) that you're using in your system. This is important to consider because each model has different capabilities, limitations, and risks, as described more fully in the sections above.

    2. Identify any other harms or increased scope of harm presented by the intended use of the system you're developing. Consider using a Responsible AI Impact Assessment to identify potential harms.

      1. For example, let's consider an AI system that summarizes text. Some uses of text generation are lower risk than others. For example, if the system is to be used in a healthcare domain for summarizing doctor's notes, the risk of harm arising from inaccuracies is higher than if the system is summarizing online articles.

  2. Prioritize harms based on elements of risk such as frequency and severity. Assess the level of risk for each harm and the likelihood of each risk occurring in order to prioritize the list of harms you've identified. Consider working with subject matter experts and risk managers within your organization and with relevant external stakeholders when appropriate.

  3. Conduct red team testing and stress testing starting with the highest priority harms, to develop a better understanding of whether and how the identified harms are actually occurring in your scenario, as well as to identify new harms you didn't initially anticipate.

  4. Share this information with relevant stakeholders using your organization's internal compliance processes.

At the end of this Identify stage, you should have a documented, prioritized list of harms. When new harms and new instances of harms emerge through further testing and use of the system, you can update and improve this list by following the above process again.


Measure

Once a list of prioritized harms has been identified, the next stage involves developing an approach for systematic measurement of each harm and conducting evaluations of the AI system. There are manual and automated approaches to measurement. We recommend you do both, starting with manual measurement.

Manual measurement is useful for:


  1. Measuring progress on a small set of priority issues. When mitigating specific harms, it's often most productive to keep manually checking progress against a small dataset until the harm is no longer observed before moving to automated measurement.

  2. Defining and reporting metrics until automated measurement is reliable enough to

    use alone.

  3. Spot-checking periodically to measure the quality of automatic measurement.


Automated measurement is useful for:


  1. Measuring at a large scale with increased coverage to provide more comprehensive results.

  2. Ongoing measurement to monitor for any regression as the system, usage, and mitigations evolve.

Below, we provide specific recommendations to measure your AI system for potential harms. We recommend you first complete this process manually and then develop a plan to automate the process:

  1. Create inputs that are likely to produce each prioritized harm: Create measurement set(s) by generating many diverse examples of targeted inputs that are likely to produce each prioritized harm.

  2. Generate System Outputs: Pass in the examples from the measurement sets as inputs to the system to generate system outputs. Document the outputs.

  3. Evaluate System Outputs and Report Results to Relevant Stakeholders

    1. Define clear metric(s). For each intended use of your system, establish metrics that measure the frequency and degree of severity of each potentially harmful output. Create clear definitions to classify outputs that will be considered harmful or problematic in the context of your system and scenario, for each type of prioritized harm you identified.

    2. Assess the outputs against the clear metric definitions and record and quantify the occurrences of harmful outputs. Repeat the measurements periodically, to assess mitigations and monitor for any regression.

    3. Share this information with relevant stakeholders using your organization's internal compliance processes.

At the end of this measurement stage, you should have a defined measurement approach to benchmark how your system performs for each potential harm as well as an initial set of documented results. As you continue implementing and testing mitigations, the metrics and measurement sets should continue to be refined (for example, to add metrics for new harms that were initially unanticipated) and the results updated.


Mitigate

Mitigating harms presented by large language models such as the Azure OpenAI models requires an iterative, layered approach that includes experimentation and continual measurement. We recommend developing a mitigation plan that encompasses four layers of mitigations for the harms identified in the earlier stages of this process:



  1. At the model level, it's important to understand the model(s) you'll be using and what fine-tuning steps may have been taken by the model developers to align the model towards its intended uses and to reduce the risk of potentially harmful uses and outcomes.

    1. For example, for GPT-4, model developers have been able to use reinforcement learning methods as a responsible AI tool to better align the model towards the designers' intended goals.

  2. At the safety system level, you should understand the platform level mitigations that have been implemented. such as the Azure OpenAI content filters which help

    to block the output of harmful content.

  3. At the application level, application developers can implement metaprompt and user-centered design and user experience mitigations. Metaprompts are instructions provided to the model to guide its behavior; their use can make a critical difference in guiding the system to behave in accordance with your expectations. User-centered design and user experience (UX) interventions are also key mitigation tools to prevent misuse and overreliance on AI.

  4. At the positioning level, there are many ways to educate the people who will use or be affected by your system about its capabilities and limitations.

Below, we provide specific recommendations to implement mitigations at the different layers. Not all of these mitigations are appropriate for every scenario, and conversely, these mitigations may be insufficient for some scenarios. Give careful consideration to your scenario and the prioritized harms you identified, and as you implement mitigations, develop a process to measure and document their effectiveness for your system and scenario.

  1. Model level Mitigations: Review and identify which Azure OpenAI base model is best suited for the system you're building and educate yourself about its capabilities, limitations, and any measures taken to reduce the risk of the potential harms you've identified. For example, if you're using GPT-4, in addition to reading this Transparency Note, you can review OpenAI's GPT-4 System Card explaining the safety challenges presented by the model and the safety processes that OpenAI adopted to prepare GPT-4 for deployment. It may be worth experimenting with different versions of the model(s) (including through red teaming and measuring) to see how the harms present differently.

  2. Safety System Level Mitigations: Identify and evaluate the effectiveness of platform level solutions such as the Azure OpenAI content filters to help mitigate the potential harms that you have identified.

  3. Application Level Mitigations: Prompt engineering, including metaprompt tuning, can be an effective mitigation for many different types of harm. Review and implement metaprompt (also called the "system message" or "system prompt") guidance and best practices documented here.

    We recommend implementing the following user-centered design and user experience (UX) interventions, guidance, and best practices to guide users to use the system as intended and to prevent overreliance on the AI system:

    1. Review and edit interventions: Design the user experience (UX) to encourage people who use the system to review and edit the AI-generated outputs before accepting them (see HAX G9 : Support efficient correction).

    2. Highlight potential inaccuracies in the AI-generated outputs (see HAX G2 : Make clear how well the system can do what it can do), both when users first

      start using the system and at appropriate times during ongoing use. In the first run experience (FRE), notify users that AI-generated outputs may contain inaccuracies and that they should verify information. Throughout the experience, include reminders to check AI-generated output for potential inaccuracies, both overall and in relation to specific types of content the system may generate incorrectly. For example, if your measurement process has determined that your system has lower accuracy with numbers, mark numbers in generated outputs to alert the user and encourage them to check the numbers or seek external sources for verification.

    3. User responsibility. Remind people that they are accountable for the final content when they're reviewing AI-generated content. For example, when offering code suggestions, remind the developer to review and test suggestions before accepting.

    4. Disclose AI's role in the interaction. Make people aware that they are interacting with an AI system (as opposed to another human). Where appropriate, inform content consumers that content has been partly or fully generated by an AI model; such notices may be required by law or applicable best practices, and can reduce inappropriate reliance on AI-generated outputs and can help consumers use their own judgment about how to interpret and act on such content.

    5. Prevent the system from anthropomorphizing. AI models may output content containing opinions, emotive statements, or other formulations that could imply that they're human-like, that could be mistaken for a human identity, or that could mislead people to think that a system has certain capabilities when it doesn't. Implement mechanisms that reduce the risk of such outputs or incorporate disclosures to help prevent misinterpretation of outputs.

    6. Cite references and information sources. If your system generates content based on references sent to the model, clearly citing information sources helps people understand where the AI-generated content is coming from.

    7. Limit the length of inputs and outputs, where appropriate. Restricting input and output length can reduce the likelihood of producing undesirable content, misuse of the system beyond its intended uses, or other harmful or unintended uses.

    8. Structure inputs and/or system outputs. Use prompt engineering techniques within your application to structure inputs to the system to prevent open-ended responses. You can also limit outputs to be structured in certain formats or patterns. For example, if your system generates dialog for a fictional character in

      response to queries, limit the inputs so that people can only query for a predetermined set of concepts.

    9. Prepare pre-determined responses. There are certain queries to which a model may generate offensive, inappropriate, or otherwise harmful responses. When harmful or offensive queries or responses are detected, you can design your system to deliver a predetermined response to the user. Predetermined responses should be crafted thoughtfully. For example, the application can provide prewritten answers to questions such as "who/what are you?" to avoid having the system respond with anthropomorphized responses. You can also use predetermined responses for questions like, "What are your terms of use?" to direct people to the correct policy.

    10. Restrict automatic posting on social media. Limit how people can automate your product or service. For example, you may choose to prohibit automated posting of AI-generated content to external sites (including social media), or to prohibit the automated execution of generated code.

    11. Bot detection. Devise and implement a mechanism to prohibit users from building an API on top of your product.

  4. Positioning Level Mitigations:

    1. Be appropriately transparent. It's important to provide the right level of transparency to people who use the system, so that they can make informed decisions around the use of the system.

    2. Provide system documentation. Produce and provide educational materials for your system, including explanations of its capabilities and limitations. For example, this could be in the form of a "learn more" page accessible via the system.

    3. Publish user guidelines and best practices. Help users and stakeholders use the system appropriately by publishing best practices, for example on prompt crafting, reviewing generations before accepting them, etc. Such guidelines can help people understand how the system works. When possible, incorporate the guidelines and best practices directly into the UX.

As you implement mitigations to address potential identified harms, it's important to develop a process for ongoing measurement of the effectiveness of such mitigations, to document measurement results, and to review those measurement results to continually improve the system.


Operate

Once measurement and mitigation systems are in place, we recommend that you define and execute a deployment and operational readiness plan. This stage includes

completing appropriate reviews of your system and mitigation plans with relevant stakeholders, establishing pipelines to collect telemetry and feedback, and developing an incident response and rollback plan.

Some recommendations for how to deploy and operate a system that uses the Azure OpenAI service with appropriate, targeted harms mitigations include:

  1. Work with compliance teams within your organization to understand what types of reviews are required for your system and when they are required (for example, legal review, privacy review, security review, accessibility review, etc.).

  2. Develop and implement the following:

    1. Develop a phased delivery plan. We recommend you launch systems using the Azure OpenAI service gradually using a "phased delivery" approach. This gives a limited set of people the opportunity to try the system, provide feedback, report issues and concerns, and suggest improvements before the system is released more widely. It also helps to manage the risk of unanticipated failure modes, unexpected system behaviors, and unexpected concerns being reported.

    2. Develop an incident response plan. Develop an incident response plan and evaluate the time needed to respond to an incident.

    3. Develop a rollback plan Ensure you can roll back the system quickly and efficiently in case an unanticipated incident occurs.

    4. Prepare for immediate action for unanticipated harms. Build the necessary features and processes to block problematic prompts and responses as they're discovered and as close to real-time as possible. When unanticipated harms do occur, block the problematic prompts and responses as quickly as possible, develop and deploy appropriate mitigations, investigate the incident, and implement a long-term solution.

    5. Develop a mechanism to block people who are misusing your system. Develop a mechanism to identify users who violate your content policies (for example, by generating hate speech) or are otherwise using your system for unintended or harmful purposes, and take action against further abuse. For example, if a user frequently uses your system to generate content that is blocked or flagged by content safety systems, consider blocking them from further use of your system. Implement an appeal mechanism where appropriate.

    6. Build effective user feedback channels. Implement feedback channels through which stakeholders (and the general public, if applicable) can submit feedback or report issues with generated content or that otherwise arise during their use of the system. Document how such feedback is processed, considered, and addressed. Evaluate the feedback and work to improve the system based on user feedback. One approach could be to include buttons with generated content that would allow users to identify content as "inaccurate," "harmful" or

      "incomplete." This could provide a more widely used, structured and feedback signal for analysis.

    7. Telemetry data. Identify and record (consistent with applicable privacy laws, policies, and commitments) signals that indicate user satisfaction or their ability to use the system as intended. Use telemetry data to identify gaps and improve the system.

This document is not intended to be, and should not be construed as providing, legal advice. The jurisdiction in which you're operating may have various regulatory or legal requirements that apply to your AI system. Consult a legal specialist if you are uncertain about laws or regulations that might apply to your system, especially if you think those might impact these recommendations. Be aware that not all of these recommendations and resources are appropriate for every scenario, and conversely, these recommendations and resources may be insufficient for some scenarios.


Learn more about responsible AI

Microsoft AI principles

Microsoft responsible AI resources

Microsoft Azure Learning courses on responsible AI


Learn more about Azure OpenAI

Limited access to Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn

Code of Conduct for the Azure OpenAI Service | Microsoft Learn

Data, privacy, and security for Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn

Transparency Note for Azure OpenAI Service

Article • 05/19/2023


What is a Transparency Note?

An AI system includes not only the technology, but also the people who use it, the people who are affected by it, and the environment in which it's deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft's Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system, or share them with the people who will use or be affected by your system.

Microsoft’s Transparency Notes are part of a broader effort at Microsoft to put our AI Principles into practice. To find out more, see the Microsoft's AI principles .


The basics of the Azure OpenAI Models

Azure OpenAI provides customers with a fully managed AI service that lets developers and data scientists apply OpenAI's powerful models including models that can generate natural language, code, and images. Within the Azure OpenAI Service, the OpenAI models are integrated with Microsoft-developed content filtering and abuse detection models. Learn more about content filtering here and abuse detection here.

Select the tabs to see content for the relevant model type.


Introduction



As part of the fully managed Azure OpenAI Service, the GPT-3 models analyze and generate natural language, Codex models analyze and generate code and plain text code commentary, and the GPT-4 models can understand and generate natural language and code. These models use an autoregressive architecture, meaning they

Text and code models

use data from prior observations to predict the most probable next word. This process is then repeated by appending the newly generated content to the original text to produce the complete generated response. Because the response is conditioned on the input text, these models can be applied to various tasks simply by changing the input text.

The GPT-3 series of models are pretrained on a wide body of publicly available free text data. This data is sourced from a combination of web crawling (specifically, a filtered version of Common Crawl , which includes a broad range of text from the internet and comprises 60 percent of the weighted pretraining dataset) and higher-quality datasets, including an expanded version of the WebText dataset, two internet-based books corpora and English-language Wikipedia. The GPT-4 base model was trained using publicly available data (such as internet data) and data that was licensed by OpenAI. The model was fine-tuned using reinforcement learning with human feedback (RLHF).

Learn more about the training and modeling techniques in OpenAI's GPT-3 , GPT-

4 , and Codex research papers. The guidance below is also drawn from OpenAI's safety best practices .


Key terms


Term

Definition

Prompt

The text you send to the service in the API call. This text is then input into the


model. For example, one might input the following prompt:



Convert the questions to a command:


Q: Ask Constance if we need some bread


A: send-msg 'find constance' Do we need some bread?


Q: Send a message to Greg to figure out if things are ready for


Wednesday.


A:

Completion The text Azure OpenAI outputs in response. For example, the service may or respond with the following answer to the above prompt: send-msg 'find Generation greg' figure out if things are ready for Wednesday.

Token

Azure OpenAI processes text by breaking it down into tokens. Tokens can be words or just chunks of characters. For example, the word hamburger gets broken up into the tokens ham , bur and ger , while a short and common word like pear is a single token. Many tokens start with a whitespace, for example hello and bye .

Capabilities


Text and code models


The GPT-4, GPT-3, and Codex Azure OpenAI Service models use natural language instructions and examples in the prompt to identify the task. The model then completes the task by predicting the most probable next text. This technique is known as "in-context" learning. These models are not retrained during this step but instead give predictions based on the context you include in the prompt.

There are three main approaches for in-context learning. These approaches vary based on the amount of task-specific data that is given to the model:

Few-shot : In this case, a user includes several examples in the prompt that demonstrate the expected answer format and content. The following example shows a few-shot prompt providing multiple examples:




Convert the questions to a command:

Q: Ask Constance if we need some bread

A: send-msg `find constance` Do we need some bread?

Q: Send a message to Greg to figure out if things areready for Wednesday.

A: send-msg `find greg` Is everything ready forWednesday? Q: Ask Ilya if we're still having our meeting thisevening

A: send-msg `find ilya` Are we still having a meetingthis evening?

Q: Contact the ski store and figure out if I can getmy skis fixed before I leave on Thursday

A: send-msg `find ski store` Would it be possible toget my skis fixed before I leave on Thursday?

Q: Thank Nicolas for lunch

A: send-msg `find nicolas` Thank you for lunch!

Q: Tell Constance that I won't be home before 19:30tonight — unmovable meeting.

A: send-msg `find constance` I won't be home before19:30 tonight. I have a meeting I can't move.

Q: Tell John that I need to book an appointment at10:30 A:

The number of examples typically ranges from 0 to 100 depending on how many can fit in the maximum input length for a single prompt. Few-shot learning enables a major reduction in the amount of task-specific data required for accurate predictions.

One-shot : This case is the same as the few-shot approach except only one example is provided. The following example shows a one-shot prompt:


Convert the questions to a command:

Q: Ask Constance if we need some bread

A: send-msg `find constance` Do we need some bread?

Q: Send a message to Greg to figure out if things are ready for Wednesday.

A:


Zero-shot: In this case, no examples are provided to the model and only the task request is provided. The following example shows a zero-shot prompt:


Convert the question to a command:

Q: Ask Constance if we need some bread A:


Use cases


Text and code models


Intended uses

The GPT-4, GPT-3, and Codex models in the Azure OpenAI service can be used in multiple scenarios. The following list isn't comprehensive, but it illustrates the diversity of tasks that can be supported with appropriate mitigations:

Chat and conversation interaction : Users can interact with a conversational agent that responds with responses drawn from trusted documents such as internal company documentation or tech support documentation.

Conversations must be limited to answering scoped questions.

Chat and conversation creation : Users can create a conversational agent that responds with responses drawn from trusted documents such as internal company documentation or tech support documentation. Conversations must be limited to answering scoped questions.

Code generation or transformation scenarios : For example, converting one programming language to another, generating docstrings for functions, converting natural language to SQL.

Journalistic content : For use to create new journalistic content or to rewrite journalistic content submitted by the user as a writing aid for predefined topics. Users cannot use the application as a general content creation tool for all topics. May not be used to generate content for political campaigns.

Question-answering : Users can ask questions and receive answers from

trusted source documents such as internal company documentation. The application does not generate answers ungrounded in trusted source documentation.

Reason over structured and unstructured data : Users can analyze inputs using classification, sentiment analysis of text, or entity extraction. Examples include analyzing product feedback sentiment, analyzing support calls and transcripts, and refining text-based search with embeddings.

Search : Users can search trusted source documents such as internal company

documentation. The application does not generate results ungrounded in trusted source documentation.

Summarization : Users can submit content to be summarized for predefined topics built into the application and cannot use the application as an open-ended summarizer. Examples include summarization of internal company documentation, call center transcripts, technical reports, and product reviews. Writing assistance on specific topics : Users can create new content or rewrite content submitted by the user as a writing aid for business content or predefined topics. Users can only rewrite or create content for specific business purposes or predefined topics and cannot use the application as a general content creation tool for all topics. Examples of business content include proposals and reports. For journalistic use, see above Journalistic content use case.


Considerations when choosing a use case

We encourage customers to use the Azure OpenAI GPT-4, GPT-3, and Codex models in their innovative solutions or applications as approved in their Limited Access registration form. However, here are some considerations when choosing a use case:

Not suitable for open-ended, unconstrained content generation. Scenarios where users can generate content on any topic are more likely to produce offensive or harmful text. The same is true of longer generations.

Not suitable for scenarios where up-to-date, factually accurate information is crucial unless you have human reviewers or are using the models to search your own documents and have verified suitability for your scenario. The service does not have information about events that occur after its training

date, likely has missing knowledge about some topics, and may not always produce factually accurate information.

Avoid scenarios where use or misuse of the system could result in significant physical or psychological injury to an individual. For example, scenarios that diagnose patients or prescribe medications have the potential to cause significant harm.

Avoid scenarios where use or misuse of the system could have a consequential impact on life opportunities or legal status. Examples include scenarios where the AI system could affect an individual's legal status, legal rights, or their access to credit, education, employment, healthcare, housing, insurance, social welfare benefits, services, opportunities, or the terms on which they're provided.

Avoid high stakes scenarios that could lead to harm. The models hosted by Azure OpenAI service reflect certain societal views, biases, and other undesirable content present in the training data or the examples provided in the prompt. As a result, we caution against using the models in high-stakes scenarios where unfair, unreliable, or offensive behavior might be extremely costly or lead to harm.

Carefully consider use cases in high stakes domains or industry: Examples include but are not limited to healthcare, medicine, finance, or legal.

Carefully consider well-scoped chatbot scenarios. Limiting the use of the service in chatbots to a narrow domain reduces the risk of generating unintended or undesirable responses.

Carefully consider all generative use cases. Content generation scenarios may be more likely to produce unintended outputs and these scenarios require careful consideration and mitigations.


Limitations

When it comes to large-scale natural language models and image models, there are particular fairness and responsible AI issues to consider. People use language and images to describe the world and to express their beliefs, assumptions, attitudes, and values. As a result, publicly available text and image data typically used to train large-scale natural language processing and image generation models contains societal biases relating to race, gender, religion, age, and other groups of people, as well as other undesirable content. These societal biases are reflected in the distributions of words, phrases, and syntactic structures.


Technical limitations, operational factors and ranges

Caution

Please be advised that this section contains illustrative examples which include terms and language that some individuals may find offensive.


Large-scale natural language and image models trained with such data can potentially behave in ways that are unfair, unreliable, or offensive, in turn causing harms. Some of the ways are listed here. We emphasize that these types of harms aren't mutually exclusive. A single model can exhibit more than one type of harm, potentially relating to multiple different groups of people. For example:

Allocation: These models can be used in ways that lead to unfair allocation of resources or opportunities. For example, automated resume screening systems can withhold employment opportunities from one gender if they're trained on resume data that reflects the existing gender imbalance in a particular industry. Or the DALL·E 2 model could be used to create imagery in the style of a known artist, which could affect the value of the artist's work or the artist's life opportunities.

Quality of service: The Azure OpenAI models are trained primarily on English text and images with English text descriptions. Languages other than English will experience worse performance. English language varieties with less representation in the training data might experience worse performance. The publicly available images used to train the DALL·E models might reinforce public bias and other undesirable content. The models are also unable to generate consistent photorealistic images and comprehensive text at this time.

Stereotyping: These models can reinforce stereotypes. For example, when

translating "He is a nurse" and "She is a doctor" into a genderless language such as Turkish and then back into English, many machine translation systems yield the stereotypical (and incorrect) results of "She is a nurse" and "He is a doctor." With DALL·E 2, when generating an image based on the prompt "Fatherless children," the model could generate images of Black children only, reinforcing harmful stereotypes that may exist in publicly available images.

Demeaning: These models can demean people. For example, an open-ended content generation system with inappropriate or insufficient mitigations might produce offensive or demeaning to a particular group of people.

Overrepresentation and underrepresentation: These models can over- or under-represent groups of people, or even erase their representation entirely. For example, if text prompts that contain the word "gay" are detected as potentially harmful or offensive, this could to the underrepresentation or even erasure of legitimate image generations by or about the LGBTQIA+ community.

Inappropriate or offensive content: These models can produce other types of inappropriate or offensive content. Examples include the ability to create images that potentially contain harmful artifacts such as hate symbols; images that illicit harmful connotations; images that relate to contested, controversial, or ideologically polarizing topics; images that are manipulative; images that contain sexually charged content that isn't caught by sexual-related content filters; and images that relate to sensitive or emotionally charged topics. For example, a well-intentioned text prompt aimed to create an image of the New York skyline with clouds and airplanes flying over it might unintentionally generate images that illicit sentiments related to the events surrounding 9/11.

Disinformation and misinformation about sensitive topics: Because DALL·E 2 is a powerful image generation model, it can be used to produce disinformation and misinformation that can be extremely harmful. For example, the model might generate an image of a political leader engaging in activity of a violent or sexual (or simply inaccurate) nature that might lead to defamation and other consequential harms, including but not limited to public protests, political change, or fake news.

Information reliability: Language model responses can fabricate content that may

sound reasonable but is nonsensical or inaccurate with respect to external validation sources. Even when drawing responses from trusted source information, responses may misrepresent that content.

False information: Azure OpenAI doesn't fact-check or verify content that is provided by customers or users. Depending on how you've developed your application, it might produce false information unless you've built in mitigations (see Best practices for improving system performance below).


System performance

In many AI systems, performance is often defined in relation to accuracy—that is, how often the AI system offers a correct prediction or output. With large-scale natural language models and image models, two different users might look at the same output and have different opinions of how useful or relevant it is, which means that performance for these systems must be defined more flexibly. Here, we broadly consider performance to mean that the application performs as you and your users expect, including not generating harmful outputs.

Azure OpenAI service can support a wide range of applications like search, classification, code generation, and image generation, each with different performance metrics and mitigation strategies. There are several steps you can take to mitigate some of the concerns listed under "Limitations" and to improve performance. Other important

mitigation techniques are outlined in the section Evaluating and integrating Azure OpenAI for your use below.


Best practices for improving system performance

Show and tell when designing prompts. With text and code models, make it clear to the model what kind of outputs you expect through instructions, examples, or a combination of the two. If you want the model to rank a list of items in alphabetical order or to classify a paragraph by sentiment, show it that's what you want.

Keep your application on topic. Carefully structure prompts and image inputs to reduce the chance of producing undesired content, even if a user tries to use it for this purpose. For instance, you might indicate in your prompt that a chatbot only engages in conversations about mathematics and otherwise responds "I'm sorry. I'm afraid I can't answer that." Adding adjectives like "polite" and examples in your desired tone to your prompt can also help steer outputs. With image models, you might indicate in your prompt or image input that your application generates only conceptual images. It might otherwise generate a pop-up notification that explains that the application is not for photorealistic use or to portray reality. Consider nudging users toward acceptable queries and image inputs, either by listing such examples up front or by offering them as suggestions upon receiving an off-topic request. Consider training a classifier to determine whether an input (prompt or image) is on topic or off topic.

Provide quality data. With text and code models, if you're trying to build a classifier or get the model to follow a pattern, make sure that there are enough examples. Be sure to proofread your examples—the model is usually smart enough to see through basic spelling mistakes and give you a response, but it also might assume this is intentional and it could affect the response. Providing quality data also includes giving your model reliable data to draw responses from in chat and question answering systems.

Measure model quality. As part of general model quality, consider measuring and improving fairness-related metrics and other metrics related to responsible AI in addition to traditional accuracy measures for your scenario. Consider resources like this checklist when you measure the fairness of the system. These measurements come with limitations, which you should acknowledge and communicate to stakeholders along with evaluation results.

Limit the length, structure, and rate of inputs and outputs. Restricting the length or structure of inputs and outputs can increase the likelihood that the application will stay on task and mitigate, at least in part, any potentially unfair, unreliable, or offensive behavior. Other options to reduce the risk of misuse include (i) restricting

the source of inputs (for example, limiting inputs to a particular domain or to authenticated users rather than being open to anyone on the internet) and (ii) implementing usage rate limits.

Encourage human review of outputs prior to publication or dissemination. With generative AI, there is potential for generating content that might be offensive or not related to the task at hand, even with mitigations in place. To ensure that the generated output meets the task of the customer, consider building ways to remind customers to review their outputs for quality prior to sharing widely. This can reduce many different harms, including offensive material, disinformation, and more.

Implement additional scenario-specific mitigations. Refer to the mitigations outlined in Evaluating and integrating Azure OpenAI for your use including content moderation strategies. These do not represent every mitigation that might be required for your application, but they point to the general minimum baseline we check for when approving use cases for Azure OpenAI Service.


Evaluating and integrating Azure OpenAI for your use

For additional information on how to evaluate and integrate these models responsibly, please see the RAI Overview document.


Learn more about responsible AI

Microsoft AI principles

Microsoft responsible AI resources

Microsoft Azure Learning courses on responsible AI


Learn more about Azure OpenAI

Limited access to Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn

Code of Conduct for the Azure OpenAI Service | Microsoft Learn

Data, privacy, and security for Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn

Limited access to Azure OpenAI Service

Article • 04/28/2023


As part of Microsoft's commitment to responsible AI, we are designing and releasing Azure OpenAI Service with the intention of protecting the rights of individuals and society and fostering transparent human-computer interaction. For this reason, we currently limit the access and use of Azure OpenAI, including limiting access to the ability to modify content filters and/or abuse monitoring.


Registration process

Azure OpenAI requires registration and is currently only available to approved enterprise customers and partners. Customers who wish to use Azure OpenAI are required to submit a registration form .

Customers must attest to any and all use cases for which they will use the service (the use cases from which customers may select will populate in the form after selection of the desired model(s) in Question 22 in the initial registration form). Customers who wish to add additional use cases after initial onboarding must submit the additional use cases using this form . The use of Azure OpenAI is limited to use cases that have been selected in a registration form. Microsoft may require customers to re-verify this information. Read more about example use cases and use cases to avoid here.

Customers who wish to modify content filters and modify abuse monitoring after they have onboarded to the service are subject to additional eligibility criteria and scenario restrictions. At this time, modified content filters and/or modified abuse monitoring for Azure OpenAI Service are only available to managed customers and partners working with Microsoft account teams and have additional use case restrictions. Customers meeting these requirements can register here .

Access to the Azure OpenAI Service is subject to Microsoft's sole discretion based on eligibility criteria and a vetting process, and customers must acknowledge that they have read and understand the Azure terms of service for Azure OpenAI Service.

Azure OpenAI Service is made available to customers under the terms governing their subscription to Microsoft Azure Services, including the Azure OpenAI section of the Microsoft Product Terms . Please review these terms carefully as they contain important conditions and obligations governing your use of Azure OpenAI Service.


Important links

Register to use Azure OpenAI

Add additional use cases (if needed)

Register to modify content filters and abuse monitoring (if needed)


Help and support

FAQ about Limited Access can be found here. If you need help with Azure OpenAI, find support here. Report abuse of Azure OpenAI here .

Report problematic content to cscraireport@microsoft.com.


See also

Code of conduct for Azure OpenAI Service integrations Transparency note for Azure OpenAI Service Characteristics and limitations for Azure OpenAI Service Data, privacy, and security for Azure OpenAI Service

Code of conduct for Azure OpenAI Service

Article • 03/13/2023


The following Code of Conduct defines the requirements that all Azure OpenAI Service implementations must adhere to in good faith. This code of conduct is in addition to the Acceptable Use Policy in the Microsoft Online Services Terms .


Access requirements

Azure OpenAI Service is a Limited Access service that requires registration and is only available to approved enterprise customers and partners. Customers who wish to use this service are required to register through this form . To learn more, see Limited Access to Azure OpenAI Service.


Responsible AI mitigation requirements

Integrations with Azure OpenAI Service must: Implement meaningful human oversight

Implement strong technical limits on inputs and outputs to reduce the likelihood of misuse beyond the application's intended purpose

Test applications thoroughly to find and mitigate undesirable behaviors Establish feedback channels

Implement additional scenario-specific mitigations To learn more, see the Azure OpenAI transparency note.

Integrations with Azure OpenAI Service must not:

be used in any way that violates Microsoft’s Acceptable Use Policy , including but not limited to any use prohibited by law, regulation, government order, or decree, or any use that violates the rights of others;

be used in any way that is inconsistent with this code of conduct, including the Limited Access requirements, the Responsible AI mitigation requirements, and the Content requirements;

exceed the use case(s) you identified to Microsoft in connection with your request to use the service;

interact with individuals under the age of consent in any way that could result in exploitation or manipulation or is otherwise prohibited by law or regulation; generate or interact with content prohibited in this Code of Conduct;

be presented alongside or monetize content prohibited in this Code of Conduct; make decisions without appropriate human oversight if your application may have a consequential impact on any individual’s legal position, financial position, life opportunities, employment opportunities, human rights, or result in physical or psychological injury to an individual;

infer sensitive information about people without their explicit consent unless if used in a lawful manner by a law enforcement entity, court, or government official subject to judicial oversight in a jurisdiction that maintains a fair and independent judiciary; or

be used for chatbots that (i) are erotic, romantic, or used for companionship

purposes, or which are otherwise prohibited by this Code of Conduct; (ii) are personas of specific people without their explicit consent; (iii) claim to have special wisdom/insight/knowledge, unless very clearly labeled as being for entertainment purposes only; or (iv) enable end users to create their own chatbots without oversight.


Content requirements

We prohibit the use of our service for generating content that can inflict harm on individuals or society. Our content policies are intended to improve the safety of our platform.

These content requirements apply to the output of all models developed by OpenAI and hosted in Azure OpenAI, such as GPT-3, GPT-4, Codex models, and DALL·E 2, and includes content provided as input to the service and content generated as output from the service.


Exploitation and Abuse


Child sexual exploitation and abuse

Azure OpenAI Service prohibits content that describes, features, or promotes child sexual exploitation or abuse, whether or not prohibited by law. This includes sexual content involving a child or that sexualizes a child.

Grooming

Azure OpenAI Service prohibits content that describes or is used for purposes of grooming of children. Grooming is the act of an adult building a relationship with a child for the purposes of exploitation, especially sexual exploitation. This includes communicating with a child for the purpose of sexual exploitation, trafficking, or other forms of exploitation.


Non-consensual intimate content

Azure OpenAI Service prohibits content that describes, features, or promotes non-consensual intimate activity.


Sexual solicitation

Azure OpenAI Service prohibits content that describes, features, or promotes, or is used for, purposes of solicitation of commercial sexual activity and sexual services. This includes encouragement and coordination of real sexual activity.


Trafficking

Azure OpenAI Service prohibits content describing or used for purposes of human trafficking. This includes the recruitment of individuals, facilitation of transport, and payment for, and the promotion of, exploitation of people such as forced labor, domestic servitude, sexual slavery, forced marriages, and forced medical procedures.


Suicide and Self-Injury

Azure OpenAI Service prohibits content that describes, praises, supports, promotes, glorifies, encourages and/or instructs individual(s) on self-injury or to take their life.


Violent Content and Conduct


Graphic violence and gore

Azure OpenAI Service prohibits content that describes, features, or promotes graphic violence or gore.


Terrorism and Violent Extremism

Azure OpenAI Service prohibits content that depicts an act of terrorism; praises, or supports a terrorist organization, terrorist actor, or violent terrorist ideology; encourages terrorist activities; offers aid to terrorist organizations or terrorist causes; or aids in recruitment to a terrorist organization.


Violent Threats, Incitement, and Glorification of Violence

Azure OpenAI Service prohibits content advocating or promoting violence toward others through violent threats or incitement.


Harmful Content


Hate speech and discrimination

Azure OpenAI Service prohibits content that attacks, denigrates, intimidates, degrades, targets, or excludes individuals or groups on the basis of traits such as actual or perceived race, ethnicity, national origin, gender, gender identity, sexual orientation, religious affiliation, age, disability status, caste, or any other characteristic that is associated with systemic prejudice or marginalization.


Bullying and harassment

Azure OpenAI Service prohibits content that targets individual(s) or group(s) with threats, intimidation, insults, degrading or demeaning language or images, promotion of physical harm, or other abusive behavior such as stalking.


Deception, disinformation, and inauthentic activity

Azure OpenAI Service prohibits content that is intentionally deceptive and likely to adversely affect the public interest, including deceptive or untrue content relating to health, safety, election integrity, or civic participation. Azure OpenAI Service also prohibits inauthentic interactions, such as fake accounts, automated inauthentic activity, impersonation to gain unauthorized information or privileges, and claims to be from any person, company, government body, or entity without explicit permission to make that representation.


Active malware or exploits

Content that directly supports unlawful active attacks or malware campaigns that cause technical harms, such as delivering malicious executables, organizing denial of service

attacks, or managing command and control servers.


Additional content policies

We prohibit the use of our Azure OpenAI Service for scenarios in which the system is likely to generate undesired content due to limitations in the models or scenarios in which the system cannot be applied in a way that properly manages potential negative consequences to people and society. Without limiting the foregoing restriction, Microsoft reserves the right to revise and expand the above Content requirements to address specific harms to people and society.

This includes prohibiting content that is sexually graphic, including consensual pornographic content and intimate descriptions of sexual acts, as well as content that may influence the political process, such as an election, passage of legislation, and content for campaigning purposes.

We may at times limit our service's ability to respond to particular topics, such as probing for personal information or seeking opinions on sensitive topics or current events.

We prohibit the use of Azure OpenAI Service for activities that significantly harm other individuals, organizations, or society, including but not limited to use of the service for purposes in conflict with the applicable Azure Legal Terms and the Microsoft Product Terms .


Report abuse

If you suspect that Azure OpenAI Service is being used in a manner that is abusive or illegal, infringes on your rights or the rights of other people, or violates these policies, you can report it at the Report Abuse Portal .


Report problematic content

If Azure OpenAI Service outputs problematic content that you believe should have been filtered, report it at cscraireport@microsoft.com.


See also

Limited access to Azure OpenAI Service Transparency note for Azure OpenAI Service

Data, privacy, and security for Azure OpenAI Service

Data, privacy, and security for Azure OpenAI Service

Article • 04/05/2023


This article provides details regarding how data provided by you to the Azure OpenAI service is processed, used, and stored. Azure OpenAI stores and processes data to provide the service and to monitor for uses that violate the applicable product terms. Please also see the Microsoft Products and Services Data Protection Addendum , which governs data processing by the Azure OpenAI Service except as otherwise provided in the applicable Product Terms .

Azure OpenAI was designed with compliance, privacy, and security in mind; however, the customer is responsible for its use and the implementation of this technology.


What data does the Azure OpenAI Service process?

Azure OpenAI processes the following types of data:


Prompts and completions. Prompts are submitted by the user, and completions are output by the service, via the completions (/completions, /chat/completions) and embeddings operations.

Training & validation data. You can provide your own training data consisting of prompt-completion pairs for the purposes of fine-tuning an OpenAI model.

Results data from training process. After training a fine-tuned model, the service will output meta-data on the job which includes tokens processed and validation scores at each step.


How does the Azure OpenAI Service process data?

The diagram below illustrates how your data is processed. This diagram covers three different types of processing:

  1. How the Azure OpenAI Service creates a fine-tuned (custom) model with your training data;

  2. How the Azure OpenAI Service processes your text prompts to generate completions and embeddings results; and

  3. How the Azure OpenAI Service and Microsoft personnel analyze prompts and completions for abuse, misuse or harmful content generation, or for debugging purposes in the event of a failure.



Training data for purposes of fine-tuning an OpenAI model

The training data (prompt-completion pairs) submitted to the Fine-tunes API through the Azure OpenAI Studio is pre-processed using automated tools for quality checking including a data format check. The training data is then imported to the model training component on the Azure OpenAI platform. During the training process, the training data is decomposed into batches and used to modify the weights of the OpenAI models.

Training data provided by the customer is only used to fine-tune the customer's model and is not used by Microsoft to train or improve any Microsoft models.


Text prompts to generate completions and embeddings results

Once a model deployment (consisting of a customer's fine-tuned model or a base model endpoint) is provisioned in a customer's Azure OpenAI resource, the customer can submit text prompts to the model using our Completions or Embeddings operations

through the REST API, client libraries, or the Azure OpenAI Studio; the model generates text outputs (completions) that are returned through the API.

When data is submitted to the service, it is processed through our content filters as well as the specified OpenAI model. The content filtering models are run on both the prompt inputs as well as the generated completions.

No prompts or completions are stored in the model during these operations, and prompts and completions are not used to train, retrain or improve the models.


Preventing abuse and harmful content generation

The Azure OpenAI Service includes a content management system that works alongside the models to filter potentially harmful content. This system works by running both the input prompt and generated completion through an ensemble of classification models aimed at detecting misuse. If the system identifies harmful content, customers receive either an error on the API call if the prompt was deemed inappropriate or the finish_reason on the response will be content_filter to signify that some of the generation was filtered. Learn more about content filtering here. No prompts or completions are stored in these content classification models; prompts and completions are not used to train, retrain or improve the classification models.

In addition to synchronous content filtering, the Azure OpenAI Service stores prompts and completions from the service for up to thirty (30) days to monitor for content and/or behaviors that suggest use of the service in a manner that may violate applicable product terms. Authorized Microsoft employees may review prompt and completion data that has triggered our automated systems to investigate and verify potential abuse. For customers who have deployed Azure OpenAI Service in the European Economic Area, the authorized Microsoft employees will be located in the European Economic Area.

In the event of a confirmed policy violation, a customer may be asked to take immediate action to remediate the issue to and to prevent further abuse. Failure to address the issue may result in suspension or termination of Azure OpenAI resource access.

Customers may request to modify content filtering and/or abuse monitoring by submitting the form here . If a customer is approved and remains in compliance with all requirements to modify abuse monitoring, then prompts and completions are not stored.

How is data retained and what Customer controls are available?

Training, validation, and training results data. The Files API allows customers to upload their training data for the purpose of fine-tuning a model. This data is stored in Azure Storage, encrypted at rest by Microsoft Managed keys, within the same region as the resource and logically isolated with their Azure subscription and API Credentials. Uploaded files can be deleted by the user via the DELETE API operation.

Fine-tuned OpenAI models. The Fine-tunes API allows customers to create their own fine-tuned version of the OpenAI models based on the training data that they have uploaded to the service via the Files APIs. The trained fine-tuned models are stored in Azure Storage in the same region, encrypted at rest and logically isolated with their Azure subscription and API credentials. Fine-tuned models can be deleted by the user by calling the DELETE API operation.

Prompts and completions. The prompts and completions data may be temporarily stored by the Azure OpenAI Service in the same region as the resource for up to 30 days. This data is encrypted and is only accessible to authorized Microsoft employees for (1) debugging purposes in the event of a failure, and (2) investigating patterns of abuse and misuse to determine if the service is being used in a manner that violates the applicable product terms. Note: When a customer is approved for modified abuse monitoring, prompts and completions data are not stored, and thus Microsoft employees have no access to the data.

To learn more about Microsoft's privacy and security commitments visit the Microsoft Trust Center .


Frequently asked questions

Can a customer opt out of the logging and human review process?

Some customers may want to use the Azure OpenAI Service for a use case that involves the processing of sensitive, highly confidential, or legally-regulated input data but where the likelihood of harmful outputs and/or misuse is low. These customers may conclude that they do not want or do not have the right to permit Microsoft to process such data for abuse detection, as described above, due to their internal policies or applicable legal regulations. To address these concerns, Microsoft allows customers who meet additional

Limited Access eligibility criteria and attest to specific use cases to apply to modify the Azure OpenAI content management features.

If Microsoft approves a customer's request to modify abuse monitoring, then Microsoft does not store any prompts and completions associated with the approved Azure subscription for which abuse monitoring is configured off. In this case, because no prompts and completions are stored at rest in the Service Results Store, the human review process is not possible and is not performed.



Customers can apply for modified abuse monitoring here: https://aka.ms/oai/modifiedaccess .


How can a customer verify if logging for abuse monitoring is off?

There are two ways for an approved customer to verify that logging for abuse monitoring has been turned off in their approved Azure subscription: (1) Azure portal or

(2) Azure CLI (or any MGMT API).


Note

Note: The value of "false" for the "ContentLogging" attribute appears only if logging is turned off. Otherwise, this property will not appear in either Azure portal

or Azure CLI's output. If a customer wants to verify if the logging is on, they would need to raise a support ticket.


Prerequisites


Sign into Azure

Select the Azure Subscription which hosts the Azure OpenAI Service resource. Navigate to the "Overview" page of the Azure OpenAI Service resource.

Logging Status Verification via the Azure portal


Go to the resource Overview page


Click on the 'JSON view' link on the top right corner as shown in the image below



There will be a value in the Capabilities list called 'ContentLogging' which will appear and be set to FALSE when logging for abuse monitoring is off.


JSON


...


{



"name":"ContentLogging",


"value":"false"

}


...


Logging Status Verification via the Azure CLI (or MGMT API)


Execute the following command in Azure CLI to see the same JSON data as shown in the Azure portal above.





az cognitiveservices account show -n resource\_name -g resource\_group


Is customer data logged with content filtering?

No, content filtering functions differently than abuse monitoring, and logging or storage of data is not needed. Content filtering works by applying algorithmic detection to the prompts and completions at inference time to determine if content should be filtered.

No data is stored at rest as part of this process, and thus there is no human review of prompts and/or completions that have been subject to filtering. You can learn more about content filtering in the service here: Azure OpenAI Service content filtering -Azure OpenAI | Microsoft Learn.

You can also apply to modify Azure OpenAI content filtering here: Azure OpenAI Limited Access Review: Modified Content Filters and Abuse Monitoring (microsoft.com) . As described above, although customers use the same form to request modification of content filtering and/or abuse monitoring, the impact on data logging and access differs depending on which aspects of the content management system are modified.


How can a customer verify if content filtering is off?

If a customer wants to verify that content filtering for their approved subscription ID(s) has been configured off, they can create a support ticket in the Azure Portal in one of two ways:

  1. From the main portal dashboard in Help + Support in the side navigation bar

  2. Within the Azure OpenAI resource itself in New Support Request in the side navigation bar


What is covered by customer managed key encryption?

Customer-managed keys (CMK), also known as Bring your own key (BYOK), offer greater flexibility to create, rotate, disable, and revoke access controls. You can also audit the encryption keys used to protect your data. CMK encrypts all customer data stored at rest in the Azure OpenAI Service (such as data uploaded for fine-tuning) except for data logged for 30 days as described above. (Learn more: Azure OpenAI Service encryption of data at rest - Azure Cognitive Services | Microsoft Learn)


What happens if Microsoft needs access to my data?

Most operations, support, and troubleshooting performed by Microsoft personnel and sub-processors do not require access to customer data. In those rare circumstances where such access is required, Customer Lockbox for Microsoft Azure provides an interface for customers to review and approve or reject customer data access requests. Customer Lockbox is used in cases where a Microsoft engineer needs to access customer data, whether in response to a customer-initiated support ticket or a problem identified by Microsoft. In the Azure OpenAI Service, Customer Lockbox applies to all customer data stored by the service (such as data uploaded for fine-tuning) except for the prompts and completions logged and accessed for the purposes of abuse monitoring as described above.

You can learn more about customer lockbox here: Customer Lockbox for Microsoft Azure | Microsoft Learn


Is customer data processed by Azure OpenAI sent to OpenAI?

No. Microsoft hosts the OpenAI models within our Azure infrastructure, and all customer data sent to Azure OpenAI remains within the Azure OpenAI service.


Is customer data used to train the OpenAI models?

No. We do not use customer data to train, retrain or improve the models in the Azure OpenAI Service.


See also

Limited access to Azure OpenAI Service

Code of conduct for Azure OpenAI Service integrations Transparency note and use cases for Azure OpenAI Service Characteristics and limitations for Azure OpenAI Service

Report abuse of Azure OpenAI Service through the Report Abuse Portal Report problematic content to cscraireport@microsoft.com

Azure OpenAI Service REST API reference

Article • 04/06/2023


This article provides details on the inference REST API endpoints for Azure OpenAI.


Authentication

Azure OpenAI provides two methods for authentication. you can use either API Keys or Azure Active Directory.

API Key authentication: For this type of authentication, all API requests must

api-key

include the API Key in the HTTP header. The Quickstart provides guidance

for how to make calls with this type of authentication.


Azure Active Directory authentication: You can authenticate an API call using an Azure Active Directory token. Authentication tokens are included in a request as

Authorization

the header. The token provided must be preceded by Bearer , for

example Bearer YOUR_AUTH_TOKEN . You can read our how-to guide on authenticating with Azure Active Directory.


REST API versioning

api-version

The service APIs are versioned using the

follow the YYYY-MM-DD date structure. For example:


query parameter. All versions


HTTP


POST

https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME NT_NAME/completions?api-version=2022-12-01


Completions

With the Completions operation, the model will generate one or more predicted completions based on a provided prompt. The service can also return the probabilities of alternative tokens at each position.

Create a completion



HTTP


POST https://{your-resource-

name}.openai.azure.com/openai/deployments/{deployment-id}/completions?api-version={api-version}


Path parameters


Parameter Type Required? Description

your-resource- string Required The name of your Azure OpenAI Resource.

name

deployment-id string Required The deployment name you chose when you deployed the

model.

api-version string Required The API version to use for this operation. This follows the

YYYY-MM-DD format.


Supported versions


2023-03-15-preview

Swagger spec

2022-12-01

Swagger spec


Request body


Parameter

Type

Required?

Default

Description


prompt

string or array

Optional


<\|endoftext\|>

The prompt(s) to generate completions for, encoded as a string, a list of strings, or a list of token lists. Note that

<\|endoftext\|> is the document separator that the model sees during training, so if a prompt isn't specified the model will generate as if from the beginning of a new document.


Parameter

Type

Required?

Default

Description


max_tokens

integer

Optional

16

The maximum number of tokens to generate in the completion.

The token count of your prompt plus max_tokens can't exceed the model's context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096).


temperature

number

Optional

1

What sampling temperature to use, between 0 and 2. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 ( argmax sampling ) for ones with a well-defined answer. We generally recommend altering this or top_p but not both.


top_p

number

Optional

1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.


Parameter

Type

Required?

Default

Description


logit_bias

map

Optional

null

Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool (which works for both GPT-2 and GPT-3) to convert text to token IDs.

Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like

-100 or 100 should result in a ban or exclusive selection of the relevant token. As an example, you can pass {"50256": -100} to prevent the <|endoftext|> token from being generated.


user

string

Optional


A unique identifier representing your end-user, which can help monitoring and detecting abuse


n

integer

Optional

1

How many completions to generate for each prompt. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop.


stream

boolean

Optional

False

Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.


Parameter

Type

Required?

Default

Description


logprobs

integer

Optional

null

Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 10, the API will return a list of the 10 most likely tokens. the API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response. This parameter cannot be used with gpt-35-turbo .


suffix

string

Optional

null

The suffix that comes after a completion of inserted text.


echo

boolean

Optional

False

Echo back the prompt in addition to the completion. This parameter cannot be used with gpt-35-turbo .


stop

string or array

Optional

null

Up to four sequences where the API will stop generating further tokens. The returned text won't contain the stop sequence.


presence_penalty

number

Optional

0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.


frequency_penalty

number

Optional

0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.


Parameter

Type

Required?

Default

Description


best_of

integer

Optional

1

Generates best_of completions server-side and returns the "best" (the one with the lowest log probability per token).

Results can't be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. This parameter cannot be used with gpt-35-turbo .


Example request


Console


curl

https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME NT_NAME/completions?api-version=2022-12-01\

-H "Content-Type: application/json" \

-H "api-key: YOUR_API_KEY" \

-d "{

\"prompt\": \"Once upon a time\",

\"max_tokens\": 5

}"


Example response


JSON


{

"id": "cmpl-4kGh7iXtjW4lc9eGhff6Hp8C7btdQ", "object": "text_completion",

"created": 1646932609, "model": "ada",

"choices": [

{

"text": ", a dark line crossed",


"index": 0,

"logprobs": null,

"finish_reason": "length"

}

]

}


In the example response,

equals stop . If

equals

finish_reason

finish_reason

content_filter

consult our content filtering guide to understand why this is occurring.


Embeddings

Get a vector representation of a given input that can be easily consumed by machine learning models and other algorithms.


Note

We currently do not support batching of embeddings into a single API call. If you receive the error InvalidRequestError: Too many inputs. The max number of inputs is 1. We hope to increase the number of inputs per request soon. , this typically occurs when an array of embeddings is attempted to be passed as a batch rather than a single string. The string can be up to 8191 tokens in length when using the text-embedding-ada-002 (Version 2) model.

Create an embedding


HTTP


POST https://{your-resource-

name}.openai.azure.com/openai/deployments/{deployment-id}/embeddings?api-version={api-version}

Path parameters


Parameter Type Required? Description

your- string Required The name of your Azure OpenAI Resource.

resource-name

deployment- string Required The name of your model deployment. You're required to first

id deploy a model before you can make calls


Parameter

Type

Required?

Description


api-version

string

Required

The API version to use for this operation. This follows the YYYY-MM-DD format.


Supported versions


2023-03-15-preview

Swagger spec

2022-12-01

Swagger spec


Request body


Parameter

Type

Required?

Default

Description


input

string

Yes

N/A

Input text to get embeddings for, encoded as a string. The number of input tokens varies depending on what model you are using.

Unless you're embedding code, we suggest replacing newlines (\n) in your input with a single space, as we have observed inferior results when newlines are present.


user

string

No

Null

A unique identifier representing for your end-user. This will help Azure OpenAI monitor and detect abuse. Do not pass PII identifiers instead use

pseudoanonymized values such as GUIDs


Example request


Console


curl

https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME NT_NAME/embeddings?api-version=2022-12-01 \

-H "Content-Type: application/json" \

-H "api-key: YOUR_API_KEY" \

-d "{\"input\": \"The food was delicious and the waiter...\"}"


Example response


JSON


{

"object": "list", "data": [

{


"object": "embedding", "embedding": [

0.018990106880664825,

-0.0073809814639389515,

.... (1024 floats total for ada) 0.021276434883475304,

],

"index": 0

}

],

"model": "text-similarity-babbage:001"

}


Chat completions

api-version=2023-03-15-

Create completions for chat messages with the ChatGPT (preview) and GPT-4 (preview) models. Chat completions are currently only available with

preview .


Create chat completions


HTTP


POST https://{your-resource-

name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions? api-version={api-version}

Path parameters


Parameter

Type

Required?

Description


your-resource-name

string

Required

The name of your Azure OpenAI Resource.


deployment-id

string

Required

The name of your model deployment. You're required to first deploy a model before you can make calls


api-version

string

Required

The API version to use for this operation. This follows the YYYY-MM-DD format.


Supported versions


2023-03-15-preview

Swagger spec


Example request


Console


curl

https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME NT_NAME/chat/completions?api-version=2023-03-15-preview \

-H "Content-Type: application/json" \

-H "api-key: YOUR_API_KEY" \

-d '{"messages":[{"role": "system", "content": "You are a helpful

assistant."},{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},{"role": "user", "content": "Do other Azure Cognitive Services support this too?"}]}'


Example response


Console


{"id":"chatcmpl-6v7mkQj980V1yBec6ETrKPRqFjNw9", "object":"chat.completion","created":1679072642, "model":"gpt-35-turbo",

"usage":{"prompt_tokens":58, "completion_tokens":68,

"total_tokens":126},

"choices":[{"message":{"role":"assistant",

"content":"Yes, other Azure Cognitive Services also support customer managed keys. Azure Cognitive Services offer multiple options for customers to

manage keys, such as using Azure Key Vault, customer-managed keys in Azure

Key Vault or customer-managed keys through Azure Storage service. This helps customers ensure that their data is secure and access to their services is

controlled."},"finish_reason":"stop","index":0}]}


finish_reason

In the example response,

equals stop . If

equals

finish_reason

content_filter

consult our content filtering guide to understand why this is occurring.


Output formatting adjusted for ease of reading, actual output is a single block of text without line breaks.


Parameter

Type

Required?

Default

Description


messages

array

Required

The messages to generate chat completions for, in the chat format.


Parameter

Type

Required?

Default

Description


temperature

number

Optional

1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.\nWe generally recommend altering this or top_p but not both.


n

integer

Optional

1

How many chat completion choices to generate for each input message.


stream

boolean

Optional

false

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message."


stop

string or array

Optional

null

Up to 4 sequences where the API will stop generating further tokens.


max_tokens

integer

Optional

inf

The maximum number of tokens allowed for the generated answer. By default, the number of tokens the model can return will be (4096 - prompt tokens).


presence_penalty

number

Optional

0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.


frequency_penalty

number

Optional

0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.


Parameter

Type

Required?

Default

Description


logit_bias

object

Optional

null

Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.

Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.


user

string

Optional


A unique identifier representing your end-user, which can help Azure OpenAI to monitor and detect abuse.


Management APIs

Azure OpenAI is deployed as a part of the Azure Cognitive Services. All Cognitive Services rely on the same set of management APIs for creation, update and delete operations. The management APIs are also used for deploying models within an OpenAI resource.

Management APIs reference documentation


Next steps

Learn about managing deployments, models, and fine-tuning with the REST API. Learn more about the underlying models that power Azure OpenAI.

Fine Tunes

Reference

Service: Cognitive Services API Version: 2022-12-01


Operations


Cancel

Cancels the processing of the fine-tune job specified by the given fine-tune-id.

Create

Creates a job that fine-tunes a specified model from a given training file. Response includes details of the enqueued job including job status and hyper paramete...

Delete

Deletes the fine-tune job specified by the given fine-tune-id.

Get

Gets details for a single fine-tune job specified by the given fine-tune-id. The details contain the base model, training and validation files, hyper parameters,...

Get Events

Gets the events for the fine-tune job specified by the given fine-tune-id. Events are created when the job status changes, e.g. running or complete, and when res...

List

Gets a list of all fine-tune jobs owned by the Azure OpenAI resource. The details that are returned for each fine-tune job contain besides its identifier the base...

Deployments - Create Or Update

Reference

Service: Cognitive Services API Version: 2021-10-01

Update the state of specified deployments associated with the Cognitive Services account.


HTTP


PUT

https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Cognitiv eServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2021-10-01


URI Parameters



Name


In


Required


Type


Description


accountName


path


True


string

The name of Cognitive Services account. Regex pattern: ^[a-zA-Z0-9][a-zA-Z0-9_.-]*$


deploymentName


path


True


string


The name of the deployment associated with the Cognitive Services Account


resourceGroupName


path


True


string


The name of the resource group. The name is case insensitive.


subscriptionId


path


True


string


The ID of the target subscription.


api-version


query


True


string


The API version to use for this operation.

Request Body



Name


Type


Description


properties


DeploymentProperties


Properties of Cognitive Services account deployment.

Responses



Name


Type


Description


200 OK


Deployment


Create/Update the deployment successfully.


201 Created


Deployment


Create the deployment successfully.


Other Status Codes


ErrorResponse


Error response describing why the operation failed.

Examples PutDeployment Sample Request

HTTP

HTTP


PUT

https://management.azure.com/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.Cognitiv eServices/accounts/accountName/deployments/deploymentName?api-version=2021-10-01


{

"properties": { "model": {

"format": "OpenAI",

"name": "ada",

"version": "1"

},

"scaleSettings": {

"scaleType": "Manual", "capacity": 1

}

}

}


Sample Response

Status code: 200


JSON


{

"id":

"/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.CognitiveServices/accounts/accountName/ deployments/deploymentName",

"name": "deploymentName",

"type": "Microsoft.CognitiveServices/accounts/deployments", "properties": {

"model": {

"format": "OpenAI",

"name": "ada",

"version": "1"

},

"scaleSettings": {

"scaleType": "Manual", "capacity": 1

},

"provisioningState": "Succeeded"

}

}


Status code: 201


JSON


{

"id":

"/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.CognitiveServices/accounts/accountName/ deployments/deploymentName",

"name": "deploymentName",

"type": "Microsoft.CognitiveServices/accounts/deployments", "properties": {

"model": {

"format": "OpenAI",

"name": "ada",

"version": "1"

},

"scaleSettings": {

"scaleType": "Manual", "capacity": 1

},

"provisioningState": "Accepted"

}

}


Definitions



Name Description


createdByType


The type of identity that created the resource.


Deployment


Cognitive Services account deployment.


DeploymentModel


Properties of Cognitive Services account deployment model.


DeploymentProperties


Properties of Cognitive Services account deployment.


DeploymentProvisioningState


Gets the status of the resource at the time the operation was called.


DeploymentScaleSettings


Properties of Cognitive Services account deployment model.


DeploymentScaleType


Deployment scale type.


ErrorAdditionalInfo


The resource management error additional info.


ErrorDetail


The error detail.


ErrorResponse


Error response


systemData


Metadata pertaining to creation and last modification of the resource.


createdByType

The type of identity that created the resource.



Name Type Description


Application string


Key string


ManagedIdentity string


User string


Deployment

Cognitive Services account deployment.



Name


Type


Description


etag


string


Resource Etag.


id


string


Fully qualified resource ID for the resource. Ex -

/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourceProviderNamespace}/{resourceType}


name


string


The name of the resource


properties


Deployment Properties


Properties of Cognitive Services account deployment.


systemData


systemData


Metadata pertaining to creation and last modification of the resource.


type


string


The type of the resource. E.g. "Microsoft.Compute/virtualMachines" or "Microsoft.Storage/storageAccounts"

DeploymentModel

Properties of Cognitive Services account deployment model.



Name


Type


Description


format


string


Deployment model format.


name


string


Deployment model name.


version


string


Deployment model version.


DeploymentProperties

Properties of Cognitive Services account deployment.



Name


Type


Description


model


Deployment Model


Properties of Cognitive Services account deployment model.


provisioningState


Deployment ProvisioningState


Gets the status of the resource at the time the operation was called.


scaleSettings


Deployment ScaleSettings


Properties of Cognitive Services account deployment model.


DeploymentProvisioningState

Gets the status of the resource at the time the operation was called.



Name Type Description


Accepted string


Creating string


Deleting string


Failed string


Moving string


Succeeded string


DeploymentScaleSettings

Properties of Cognitive Services account deployment model.



Name


Type


Description


capacity


integer


Deployment capacity.


scaleType


Deployment ScaleType


Deployment scale type.


DeploymentScaleType

Deployment scale type.



Name Type Description


Manual string


Standard string


ErrorAdditionalInfo

The resource management error additional info.



Name


Type


Description


info


object


The additional info.


type


string


The additional info type.


ErrorDetail

The error detail.



Name


Type


Description


additionalInfo


ErrorAdditional Info[]


The error additional info.


code


string


The error code.


details


ErrorDetail[]


The error details.


message


string


The error message.


target


string


The error target.


ErrorResponse

Error response



Name


Type


Description


error


ErrorDetail


The error object.


systemData

Metadata pertaining to creation and last modification of the resource.



Name


Type


Description


createdAt


string


The timestamp of resource creation (UTC).


createdBy


string


The identity that created the resource.


createdByType


createdByType


The type of identity that created the resource.


lastModifiedAt


string


The timestamp of resource last modification (UTC)


lastModifiedBy


string


The identity that last modified the resource.


lastModifiedByType


createdByType


The type of identity that last modified the resource.

Azure.AI.OpenAI Namespace

Reference


Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.


Classes


ChatChoice

The representation of a single completions result for a chat completions request.

ChatCompletions

A representation of a full response to a chat completions request.

ChatCompletionsOptions

The configuration information used for a chat completions request.

ChatMessage

A single, role-attributed message within a chat completion interaction.

Choice

Choice model within completion response.

Completions

Expected response schema to completion request.

CompletionsLogProbability

LogProbs model within completion choice.

CompletionsOptions

Post body schema to create a prompt completion from a deployment.

CompletionsUsage

Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.

EmbeddingItem

Expected response schema to embeddings object list item request.

Embeddings

Expected response schema to embeddings request.

EmbeddingsOptions

Schema to create a prompt completion from a deployment.

EmbeddingsUsage

Measurment of the amount of tokens used in this request and response.


OpenAIClient Azure OpenAI APIs for completions and search.

OpenAIClientOptions Client options for OpenAIClient.

StreamingChatChoice

StreamingChatCompletions

StreamingChoice

StreamingCompletions


Structs


ChatRole A description of the intended purpose of a message within a chat completions interaction.

Enums


OpenAIClientOptions.Service The version of the service to use. Version

com.azure.ai.openai

Reference

Package: com.azure.ai.openai

Maven Artifact: com.azure:azure-ai-openai:1.0.0-beta.1


Package containing the classes for OpenAI. Azure OpenAI APIs for completions and search.


Classes


OpenAIAsyncClient Initializes a new instance of the asynchronous OpenAIClient type.

OpenAIClient Initializes a new instance of the synchronous OpenAIClient type.

OpenAIClientBuilder A builder for creating a new instance of the OpenAIClient type.

Enums


OpenAIServiceVersion Service version of OpenAIClient.

@azure/openai package

Reference


Classes


AzureKeyCredential A static-key-based credential that supports updating the

underlying key value.


OpenAIClient A client for interacting with Azure OpenAI.


The client needs the endpoint of an OpenAI resource and an authentication method such as an API key or token. The API key and endpoint can be found in the OpenAI resource page. They will be located in the resource's Keys and Endpoint page.


Examples for authentication:


API Key


JavaScript


import { OpenAIClient } from "@azure/openai"; import { AzureKeyCredential } from

"@azure/core-auth";


const endpoint = "<azure endpoint>";

const credential = new AzureKeyCredential("<api key>");


const client = new OpenAIClient(endpoint, credential);


Azure Active Directory


JavaScript


import { OpenAIClient } from "@azure/openai"; import { DefaultAzureCredential } from

"@azure/identity";


const endpoint = "<azure endpoint>"; const credential = new

DefaultAzureCredential();

const client = new OpenAIClient(endpoint, credential);

OpenAIKeyCredential The OpenAIKeyCredential class represents an OpenAI API key

and is used to authenticate into an OpenAI client for an OpenAI endpoint.


Interfaces


ChatChoice

The representation of a single prompt completion as part of an overall chat completions request. Generally, n choices are generated per provided prompt with a default value of 1. Token limits and other settings may limit the number of choices generated.

ChatCompletions

Representation of the response data from a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data.

ChatMessage

A single, role-attributed message within a chat completion interaction.

Choice

The representation of a single prompt completion as part of an overall completions request. Generally, n choices are generated per provided prompt with a default value of 1. Token limits and other settings may limit the number of choices generated.

Completions

Representation of the response data from a completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data.

CompletionsLogProbability Model

Representation of a log probabilities model for a completions generation.

CompletionsUsage

Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.

EmbeddingItem

Representation of a single embeddings relatedness comparison.

Embeddings

Representation of the response data from an embeddings request. Embeddings measure the relatedness of text strings and are commonly used for search, clustering, recommendations, and other similar scenarios.

EmbeddingsUsage

Measurement of the amount of tokens used in this request and response.


GetChatCompletionsOptions

GetCompletionsOptions

GetEmbeddingsOptions

OpenAIClientOptions

RequestOptions


Type Aliases


ChatRole "system", "assistant", "user"

CompletionsFinishReason "stopped", "tokenLimitReached", "contentFiltered"

Azure Cognitive Services support and help options

Article • 07/22/2022 • 2 minutes to read


Are you just starting to explore the functionality of Azure Cognitive Services? Perhaps you are implementing a new feature in your application. Or after using the service, do you have suggestions on how to improve it? Here are options for where you can get support, stay up-to-date, give feedback, and report bugs for Cognitive Services.


Create an Azure support request



Explore the range of Azure support options and choose the plan that best fits,

whether you're a developer just starting your cloud journey or a large organization deploying business-critical, strategic applications. Azure customers can create and manage support requests in the Azure portal.

Azure portal

Azure portal for the United States government


Post a question on Microsoft Q&A

For quick and reliable answers on your technical product questions from Microsoft Engineers, Azure Most Valuable Professionals (MVPs), or our expert community, engage with us on Microsoft Q&A, Azure's preferred destination for community support.

If you can't find an answer to your problem using search, submit a new question to Microsoft Q&A. Use one of the following tags when you ask your question:

Cognitive Services


Vision


Computer Vision Custom Vision Face

Form Recognizer Video Indexer

Language

Immersive Reader

Language Understanding (LUIS) QnA Maker

Language service Translator

Speech


Speech service


Decision


Anomaly Detector Content Moderator Metrics Advisor Personalizer

Azure OpenAI


Azure OpenAI


Post a question to Stack Overflow


For answers on your developer questions from the largest community developer ecosystem, ask your question on Stack Overflow.

If you do submit a new question to Stack Overflow, please use one or more of the following tags when you create the question:

Cognitive Services


Vision


Computer Vision Custom Vision Face

Form Recognizer Video Indexer

Language


Immersive Reader

Language Understanding (LUIS)

QnA Maker Language service Translator

Speech


Speech service


Decision


Anomaly Detector Content Moderator Metrics Advisor Personalizer

Azure OpenAI


Azure OpenAI


Submit feedback

To request new features, post them on https://feedback.azure.com . Share your ideas for making Cognitive Services and its APIs work better for the applications you develop.

Cognitive Services


Vision


Computer Vision Custom Vision Face

Form Recognizer Video Indexer

Language


Immersive Reader

Language Understanding (LUIS) QnA Maker

Language service Translator

Speech


Speech service

Decision


Anomaly Detector Content Moderator Metrics Advisor Personalizer


Stay informed

Staying informed about features in a new release or news on the Azure blog can help you find the difference between a programming error, a service bug, or a feature not yet available in Cognitive Services.

Learn more about product updates, roadmap, and announcements in Azure Updates .

News about Cognitive Services is shared in the Azure blog . Join the conversation on Reddit about Cognitive Services.


Next steps


What are Azure Cognitive Services?



Additional resources


Documentation

Summarization language support - Azure Cognitive Services

Learn about which languages are supported by document summarization.

Fine Tunes - Create - REST API (Azure Cognitive Services)

Creates a job that fine-tunes a specified model from a given training file. Response includes details of the enqueued job including job status and hyper paramet

Summarize text with the conversation summarization API - Azure Cognitive Services

This article will show you how to summarize chat logs with the conversation summarization API.

Summarize text with the extractive summarization API - Azure Cognitive Services

This article will show you how to summarize text with the extractive summarization API.

What is document and conversation summarization (preview)? - Azure Cognitive Services

Learn about summarizing text.

Fine Tunes - REST API (Azure Cognitive Services)

Learn more about [Cognitive Services Fine Tunes Operations]. How to [Cancel,Create,Delete,Get,Get Events,List].

Quickstart: Use Document Summarization (preview) - Azure Cognitive Services

Use this quickstart to start using Document Summarization.

Deployments - Create - REST API (Azure Cognitive Services)

Learn more about Cognitive Services service - Creates a new deployment for the Azure OpenAI resource according to the given specification.

Show 5 more


Training


Learning path

Provision and manage Azure Cognitive Services - Training

Provision and manage Azure Cognitive Services

Certification

Microsoft Certified: Azure Developer Associate - Certifications

Azure developers design, build, test, and maintain cloud applications and services.